-
Exact convergence rates to derivatives of local time for some self-similar Gaussian processes
Authors:
Minhao Hong
Abstract:
In this article, for some $d-$dimensional Gaussian processes
\[X=\big\{X_t=(X^1_t,\cdots,X^d_t):t\ge0\big\},\]
whose components are i.i.d. $1-$dimensional self-similar Gaussian process with Hurst index $H\in(0,1)$, we consider the asymptotic behavior of approximation of its $\boldsymbol{k}-$th derivatives of local time under certain mild conditions, where
$\boldsymbol{k}=(k_1,\cdots,k_d)$ an…
▽ More
In this article, for some $d-$dimensional Gaussian processes
\[X=\big\{X_t=(X^1_t,\cdots,X^d_t):t\ge0\big\},\]
whose components are i.i.d. $1-$dimensional self-similar Gaussian process with Hurst index $H\in(0,1)$, we consider the asymptotic behavior of approximation of its $\boldsymbol{k}-$th derivatives of local time under certain mild conditions, where
$\boldsymbol{k}=(k_1,\cdots,k_d)$ and $k_\ell$'s are non-negative real numbers. We will give a derivative version of the limit theorems for functional of Gaussian processes and use this result to get the asymptotic behaviors.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Single Image Rolling Shutter Removal with Diffusion Models
Authors:
Zhanglei Yang,
Haipeng Li,
Mingbo Hong,
Bing Zeng,
Shuaicheng Liu
Abstract:
We present RS-Diffusion, the first Diffusion Models-based method for single-frame Rolling Shutter (RS) correction. RS artifacts compromise visual quality of frames due to the row wise exposure of CMOS sensors. Most previous methods have focused on multi-frame approaches, using temporal information from consecutive frames for the motion rectification. However, few approaches address the more challe…
▽ More
We present RS-Diffusion, the first Diffusion Models-based method for single-frame Rolling Shutter (RS) correction. RS artifacts compromise visual quality of frames due to the row wise exposure of CMOS sensors. Most previous methods have focused on multi-frame approaches, using temporal information from consecutive frames for the motion rectification. However, few approaches address the more challenging but important single frame RS correction. In this work, we present an ``image-to-motion'' framework via diffusion techniques, with a designed patch-attention module. In addition, we present the RS-Real dataset, comprised of captured RS frames alongside their corresponding Global Shutter (GS) ground-truth pairs. The GS frames are corrected from the RS ones, guided by the corresponding Inertial Measurement Unit (IMU) gyroscope data acquired during capture. Experiments show that our RS-Diffusion surpasses previous single RS correction methods. Our method and proposed RS-Real dataset lay a solid foundation for advancing the field of RS correction.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Multi-Objective Optimization for Common-Centroid Placement of Analog Transistors
Authors:
Supriyo Maji,
Hyungjoo Park,
Gi moon Hong,
Souradip Poddar,
David Z. Pan
Abstract:
In analog circuits, process variation can cause unpredictability in circuit performance. Common-centroid (CC) type layouts have been shown to mitigate process-induced variations and are widely used to match circuit elements. Nevertheless, selecting the most suitable CC topology necessitates careful consideration of important layout constraints. Manual handling of these constraints becomes challeng…
▽ More
In analog circuits, process variation can cause unpredictability in circuit performance. Common-centroid (CC) type layouts have been shown to mitigate process-induced variations and are widely used to match circuit elements. Nevertheless, selecting the most suitable CC topology necessitates careful consideration of important layout constraints. Manual handling of these constraints becomes challenging, especially with large size problems. State-of-the-art CC placement methods lack an optimization framework to handle important layout constraints collectively. They also require manual efforts and consequently, the solutions can be suboptimal. To address this, we propose a unified framework based on multi-objective optimization for CC placement of analog transistors. Our method handles various constraints, including degree of dispersion, routing complexity, diffusion sharing, and layout dependent effects. The multi-objective optimization provides better handling of the objectives when compared to single-objective optimization. Moreover, compared to existing methods, our method explores more CC topologies. Post-layout simulation results show better performance compared to state-of-the-art techniques in generating CC layouts.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration
Authors:
Ye Wang,
Jiahao Xun,
Minjie Hong,
Jieming Zhu,
Tao **,
Wang Lin,
Haoyuan Li,
Linjun Li,
Yan Xia,
Zhou Zhao,
Zhenhua Dong
Abstract:
Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem. However, existing generative methods typically focus solely on either behavioral or semantic aspects of item information, neglecting their complementary nature and thus resulting in limited effectiveness. To address this…
▽ More
Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem. However, existing generative methods typically focus solely on either behavioral or semantic aspects of item information, neglecting their complementary nature and thus resulting in limited effectiveness. To address this limitation, we introduce EAGER, a novel generative recommendation framework that seamlessly integrates both behavioral and semantic information. Specifically, we identify three key challenges in combining these two types of information: a unified generative architecture capable of handling two feature types, ensuring sufficient and independent learning for each type, and fostering subtle interactions that enhance collaborative information utilization. To achieve these goals, we propose (1) a two-stream generation architecture leveraging a shared encoder and two separate decoders to decode behavior tokens and semantic tokens with a confidence-based ranking strategy; (2) a global contrastive task with summary tokens to achieve discriminative decoding for each type of information; and (3) a semantic-guided transfer task designed to implicitly promote cross-interactions through reconstruction and estimation objectives. We validate the effectiveness of EAGER on four public benchmarks, demonstrating its superior performance compared to existing methods.
△ Less
Submitted 3 July, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge
Authors:
Yizhen Luo,
Kai Yang,
Massimo Hong,
Xing Yi Liu,
Zikun Nie,
Hao Zhou,
Zaiqing Nie
Abstract:
Capturing molecular knowledge with representation learning approaches holds significant potential in vast scientific fields such as chemistry and life science. An effective and generalizable molecular representation is expected to capture the consensus and complementary molecular expertise from diverse views and perspectives. However, existing works fall short in learning multi-view molecular repr…
▽ More
Capturing molecular knowledge with representation learning approaches holds significant potential in vast scientific fields such as chemistry and life science. An effective and generalizable molecular representation is expected to capture the consensus and complementary molecular expertise from diverse views and perspectives. However, existing works fall short in learning multi-view molecular representations, due to challenges in explicitly incorporating view information and handling molecular knowledge from heterogeneous sources. To address these issues, we present MV-Mol, a molecular representation learning model that harvests multi-view molecular expertise from chemical structures, unstructured knowledge from biomedical texts, and structured knowledge from knowledge graphs. We utilize text prompts to model view information and design a fusion architecture to extract view-based molecular representations. We develop a two-stage pre-training procedure, exploiting heterogeneous data of varying quality and quantity. Through extensive experiments, we show that MV-Mol provides improved representations that substantially benefit molecular property prediction. Additionally, MV-Mol exhibits state-of-the-art performance in multi-modal comprehension of molecular structures and texts. Code and data are available at https://github.com/PharMolix/OpenBioMed.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback
Authors:
Chenliang Li,
Siliang Zeng,
Zeyi Liao,
Jiaxiang Li,
Dongyeop Kang,
Alfredo Garcia,
Mingyi Hong
Abstract:
Aligning human preference and value is an important requirement for building contemporary foundation models and embodied AI. However, popular approaches such as reinforcement learning with human feedback (RLHF) break down the task into successive stages, such as supervised fine-tuning (SFT), reward modeling (RM), and reinforcement learning (RL), each performing one specific learning task. Such a s…
▽ More
Aligning human preference and value is an important requirement for building contemporary foundation models and embodied AI. However, popular approaches such as reinforcement learning with human feedback (RLHF) break down the task into successive stages, such as supervised fine-tuning (SFT), reward modeling (RM), and reinforcement learning (RL), each performing one specific learning task. Such a sequential approach results in serious issues such as significant under-utilization of data and distribution mismatch between the learned reward model and generated policy, which eventually lead to poor alignment performance. We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF), capable of integrating both human preference and demonstration to train reward models and the policy. The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms such as RLHF and Directly Policy Optimization (DPO), and only requires minor changes to the existing alignment pipelines. We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo. We observe that the proposed solutions outperform the existing alignment algorithms such as RLHF and DPO by large margins, especially when the amount of high-quality preference data is relatively limited.
△ Less
Submitted 19 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining
Authors:
Andi Han,
Jiaxiang Li,
Wei Huang,
Mingyi Hong,
Akiko Takeda,
Pratik Jawanpuria,
Bamdev Mishra
Abstract:
Large language models (LLMs) have shown impressive capabilities across various tasks. However, training LLMs from scratch requires significant computational power and extensive memory capacity. Recent studies have explored low-rank structures on weights for efficient fine-tuning in terms of parameters and memory, either through low-rank adaptation or factorization. While effective for fine-tuning,…
▽ More
Large language models (LLMs) have shown impressive capabilities across various tasks. However, training LLMs from scratch requires significant computational power and extensive memory capacity. Recent studies have explored low-rank structures on weights for efficient fine-tuning in terms of parameters and memory, either through low-rank adaptation or factorization. While effective for fine-tuning, low-rank structures are generally less suitable for pretraining because they restrict parameters to a low-dimensional subspace. In this work, we propose to parameterize the weights as a sum of low-rank and sparse matrices for pretraining, which we call SLTrain. The low-rank component is learned via matrix factorization, while for the sparse component, we employ a simple strategy of uniformly selecting the sparsity support at random and learning only the non-zero entries with the fixed support. While being simple, the random fixed-support sparse learning strategy significantly enhances pretraining when combined with low-rank learning. Our results show that SLTrain adds minimal extra parameters and memory costs compared to pretraining with low-rank parameterization, yet achieves substantially better performance, which is comparable to full-rank training. Remarkably, when combined with quantization and per-layer updates, SLTrain can reduce memory requirements by up to 73% when pretraining the LLaMA 7B model.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization
Authors:
Zhiwei Tang,
Jiangweizhi Peng,
Jiasheng Tang,
Mingyi Hong,
Fan Wang,
Tsung-Hui Chang
Abstract:
In this work, we focus on the alignment problem of diffusion models with a continuous reward function, which represents specific objectives for downstream tasks, such as improving human preference. The central goal of the alignment problem is to adjust the distribution learned by diffusion models such that the generated samples maximize the target reward function. We propose a novel alignment appr…
▽ More
In this work, we focus on the alignment problem of diffusion models with a continuous reward function, which represents specific objectives for downstream tasks, such as improving human preference. The central goal of the alignment problem is to adjust the distribution learned by diffusion models such that the generated samples maximize the target reward function. We propose a novel alignment approach, named Direct Noise Optimization (DNO), that optimizes the injected noise during the sampling process of diffusion models. By design, DNO is tuning-free and prompt-agnostic, as the alignment occurs in an online fashion during generation. We rigorously study the theoretical properties of DNO and also propose variants to deal with non-differentiable reward functions. Furthermore, we identify that naive implementation of DNO occasionally suffers from the out-of-distribution reward hacking problem, where optimized samples have high rewards but are no longer in the support of the pretrained distribution. To remedy this issue, we leverage classical high-dimensional statistics theory and propose to augment the DNO loss with certain probability regularization. We conduct extensive experiments on several popular reward functions trained on human feedback data and demonstrate that the proposed DNO approach achieves state-of-the-art reward scores as well as high image quality, all within a reasonable time budget for generation.
△ Less
Submitted 3 July, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Authors:
Jiaxiang Li,
Siliang Zeng,
Hoi-To Wai,
Chenliang Li,
Alfredo Garcia,
Mingyi Hong
Abstract:
Aligning human preference and value is an important requirement for contemporary foundation models. State-of-the-art techniques such as Reinforcement Learning from Human Feedback (RLHF) often consist of two stages: 1) supervised fine-tuning (SFT), where the model is fine-tuned by learning from human demonstration data; 2) Preference learning, where preference data is used to learn a reward model,…
▽ More
Aligning human preference and value is an important requirement for contemporary foundation models. State-of-the-art techniques such as Reinforcement Learning from Human Feedback (RLHF) often consist of two stages: 1) supervised fine-tuning (SFT), where the model is fine-tuned by learning from human demonstration data; 2) Preference learning, where preference data is used to learn a reward model, which is in turn used by a reinforcement learning (RL) step to fine-tune the model. Such reward model serves as a proxy to human preference, and it is critical to guide the RL step towards improving the model quality. In this work, we argue that the SFT stage significantly benefits from learning a reward model as well. Instead of using the human demonstration data directly via supervised learning, we propose to leverage an Inverse Reinforcement Learning (IRL) technique to (explicitly or implicitly) build an reward model, while learning the policy model. This approach leads to new SFT algorithms that are not only efficient to implement, but also promote the ability to distinguish between the preferred and non-preferred continuations. Moreover, we identify a connection between the proposed IRL based approach, and certain self-play approach proposed recently, and showed that self-play is a special case of modeling a reward-learning agent. Theoretically, we show that the proposed algorithms converge to the stationary solutions of the IRL problem. Empirically, we align 1B and 7B models using proposed methods and evaluate them on a reward benchmark model and the HuggingFace Open LLM Leaderboard. The proposed methods show significant performance improvement over existing SFT approaches. Our results indicate that it is beneficial to explicitly or implicitly leverage reward learning throughout the entire alignment process.
△ Less
Submitted 29 May, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models
Authors:
Yimeng Zhang,
Xin Chen,
**ghan Jia,
Yihua Zhang,
Chongyu Fan,
Jiancheng Liu,
Mingyi Hong,
Ke Ding,
Sijia Liu
Abstract:
Diffusion models (DMs) have achieved remarkable success in text-to-image generation, but they also pose safety risks, such as the potential generation of harmful content and copyright violations. The techniques of machine unlearning, also known as concept erasing, have been developed to address these risks. However, these techniques remain vulnerable to adversarial prompt attacks, which can prompt…
▽ More
Diffusion models (DMs) have achieved remarkable success in text-to-image generation, but they also pose safety risks, such as the potential generation of harmful content and copyright violations. The techniques of machine unlearning, also known as concept erasing, have been developed to address these risks. However, these techniques remain vulnerable to adversarial prompt attacks, which can prompt DMs post-unlearning to regenerate undesired images containing concepts (such as nudity) meant to be erased. This work aims to enhance the robustness of concept erasing by integrating the principle of adversarial training (AT) into machine unlearning, resulting in the robust unlearning framework referred to as AdvUnlearn. However, achieving this effectively and efficiently is highly nontrivial. First, we find that a straightforward implementation of AT compromises DMs' image generation quality post-unlearning. To address this, we develop a utility-retaining regularization on an additional retain set, optimizing the trade-off between concept erasure robustness and model utility in AdvUnlearn. Moreover, we identify the text encoder as a more suitable module for robustification compared to UNet, ensuring unlearning effectiveness. And the acquired text encoder can serve as a plug-and-play robust unlearner for various DM types. Empirically, we perform extensive experiments to demonstrate the robustness advantage of AdvUnlearn across various DM unlearning scenarios, including the erasure of nudity, objects, and style concepts. In addition to robustness, AdvUnlearn also achieves a balanced tradeoff with model utility. To our knowledge, this is the first work to systematically explore robust DM unlearning through AT, setting it apart from existing methods that overlook robustness in concept erasing. Codes are available at: https://github.com/OPTML-Group/AdvUnlearn
△ Less
Submitted 14 June, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Multiphoton Quantum Imaging using Natural Light
Authors:
Fatemeh Mostafavi,
Mingyuan Hong,
Riley B. Dawkins,
Jannatul Ferdous,
Rui-Bo **,
Roberto de J. Leon-Montiel,
Chenglong You,
Omar S. Magana-Loaiza
Abstract:
It is thought that schemes for quantum imaging are fragile against realistic environments in which the background noise is often stronger than the nonclassical signal of the imaging photons. Unfortunately, it is unfeasible to produce brighter quantum light sources to alleviate this problem. Here, we overcome this paradigmatic limitation by develo** a quantum imaging scheme that relies on the use…
▽ More
It is thought that schemes for quantum imaging are fragile against realistic environments in which the background noise is often stronger than the nonclassical signal of the imaging photons. Unfortunately, it is unfeasible to produce brighter quantum light sources to alleviate this problem. Here, we overcome this paradigmatic limitation by develo** a quantum imaging scheme that relies on the use of natural sources of light. This is achieved by performing conditional detection on the photon number of the thermal light field scattered by a remote object. Specifically, the conditional measurements in our scheme enable us to extract quantum features of the detected thermal photons to produce quantum images with improved signal-to-noise ratios. This technique shows a remarkable exponential enhancement in the contrast of quantum images. Surprisingly, this measurement scheme enables the possibility of producing images from the vacuum fluctuations of the light field. This is experimentally demonstrated through the implementation of a single-pixel camera with photon-number-resolving capabilities. As such, we believe that our scheme opens a new paradigm in the field of quantum imaging. It also unveils the potential of combining natural light sources with nonclassical detection schemes for the development of robust quantum technologies.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Polarization-entangled photon pair source using beam displacers and thin crystals
Authors:
Minjae Hong,
Rodrigo Gomez,
Valerio Flavio Gili,
Jorge Fuenzalida,
Markus Gräfe
Abstract:
We present an experimental implementation of a polarization-entangled photon pair source based on beam displacers. The down-converted photons are emitted via spontaneous parametric down-conversion in a non-degenerate and type-0 process. We obtain a state fidelity of F=0.975$\pm$0.004 and violate a Clauser-Horne Shimony-Holt inequality with S=2.75$\pm$0.01. Our source also uses thin crystals for ap…
▽ More
We present an experimental implementation of a polarization-entangled photon pair source based on beam displacers. The down-converted photons are emitted via spontaneous parametric down-conversion in a non-degenerate and type-0 process. We obtain a state fidelity of F=0.975$\pm$0.004 and violate a Clauser-Horne Shimony-Holt inequality with S=2.75$\pm$0.01. Our source also uses thin crystals for applications in quantum imaging, taking advantage of the large number of spatial modes. With this configuration, we obtain 550$\pm$12 spatial modes.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
Authors:
Chung-Yiu Yau,
Hoi-To Wai,
Parameswaran Raman,
Soumajyoti Sarkar,
Mingyi Hong
Abstract:
A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the…
▽ More
A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an Efficient Markov Chain Monte Carlo negative sampling method for Contrastive learning (EMC$^2$). We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC$^2$ finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations. Compared to prior works, EMC$^2$ is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC$^2$ is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Fractional derivatives of local times for some Gaussian processes
Authors:
Minhao Hong,
Qian Yu
Abstract:
In this article, we consider fractional derivatives of local time for $d-$dimensional centered Gaussian processes satisfying certain strong local nondeterminism property. We first give a condition for existence of fractional derivatives of the local time defined by Marchaud derivatives in $L^p(p\ge1)$ and show that these derivatives are Hölder continuous with respect to both time and space variabl…
▽ More
In this article, we consider fractional derivatives of local time for $d-$dimensional centered Gaussian processes satisfying certain strong local nondeterminism property. We first give a condition for existence of fractional derivatives of the local time defined by Marchaud derivatives in $L^p(p\ge1)$ and show that these derivatives are Hölder continuous with respect to both time and space variables and are also continuous with respect to the order of derivatives. Moreover, under some additional assumptions, we show that this condition is also necessary for existence of derivatives of the local time with the help of contour integration.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seong** Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
RAW: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees
Authors:
Xun Xian,
Ganghua Wang,
Xuan Bi,
Jayanth Srinivasa,
Ashish Kundu,
Mingyi Hong,
Jie Ding
Abstract:
Safeguarding intellectual property and preventing potential misuse of AI-generated images are of paramount importance. This paper introduces a robust and agile plug-and-play watermark detection framework, dubbed as RAW. As a departure from traditional encoder-decoder methods, which incorporate fixed binary codes as watermarks within latent representations, our approach introduces learnable waterma…
▽ More
Safeguarding intellectual property and preventing potential misuse of AI-generated images are of paramount importance. This paper introduces a robust and agile plug-and-play watermark detection framework, dubbed as RAW. As a departure from traditional encoder-decoder methods, which incorporate fixed binary codes as watermarks within latent representations, our approach introduces learnable watermarks directly into the original image data. Subsequently, we employ a classifier that is jointly trained with the watermark to detect the presence of the watermark. The proposed framework is compatible with various generative architectures and supports on-the-fly watermark injection after training. By incorporating state-of-the-art smoothing techniques, we show that the framework provides provable guarantees regarding the false positive rate for misclassifying a watermarked image, even in the presence of certain adversarial attacks targeting watermark removal. Experiments on a diverse range of images generated by state-of-the-art diffusion models reveal substantial performance enhancements compared to existing approaches. For instance, our method demonstrates a notable increase in AUROC, from 0.48 to 0.82, when compared to state-of-the-art approaches in detecting watermarked images under adversarial attacks, while maintaining image quality, as indicated by closely aligned FID and CLIP scores.
△ Less
Submitted 23 January, 2024;
originally announced March 2024.
-
Emergence of multiphoton quantum coherence by light propagation
Authors:
Jannatul Ferdous,
Mingyuan Hong,
Riley B. Dawkins,
Fatemeh Mostafavi,
Alina Oktyabrskaya,
Chenglong You,
Roberto de J. León-Montiel,
Omar S. Magaña-Loaiza
Abstract:
The modification of the quantum properties of coherence of photons through their interaction with matter lies at the heart of the quantum theory of light. Indeed, the absorption and emission of photons by atoms can lead to different kinds of light with characteristic quantum statistical properties. As such, different types of light are typically associated with distinct sources. Here, we report on…
▽ More
The modification of the quantum properties of coherence of photons through their interaction with matter lies at the heart of the quantum theory of light. Indeed, the absorption and emission of photons by atoms can lead to different kinds of light with characteristic quantum statistical properties. As such, different types of light are typically associated with distinct sources. Here, we report on the observation of the modification of quantum coherence of multiphoton systems in free space. This surprising effect is produced by the scattering of thermal multiphoton wavepackets upon propagation. The modification of the excitation mode of a photonic system and its associated quantum fluctuations result in the formation of different light fields with distinct quantum coherence properties. Remarkably, we show that these processes of scattering can lead to multiphoton systems with sub-shot-noise quantum properties. Our observations are validated through the nonclassical formulation of the emblematic van Cittert-Zernike theorem. We believe that the possibility of producing quantum systems with modified properties of coherence, through linear propagation, can have dramatic implications for diverse quantum technologies.
△ Less
Submitted 5 April, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
The Quantum Gaussian-Schell Model: A Link Between Classical and Quantum Optics
Authors:
Riley B. Dawkins,
Mingyuan Hong,
Chenglong You,
Omar S. Magana-Loaiza
Abstract:
The quantum theory of the electromagnetic field uncovered that classical forms of light were indeed produced by distinct superpositions of nonclassical multiphoton wavepackets. Specifically, partially coherent light represents the most common kind of classical light. Here, for the first time, we demonstrate the extraction of the constituent multiphoton quantum systems of a partially coherent light…
▽ More
The quantum theory of the electromagnetic field uncovered that classical forms of light were indeed produced by distinct superpositions of nonclassical multiphoton wavepackets. Specifically, partially coherent light represents the most common kind of classical light. Here, for the first time, we demonstrate the extraction of the constituent multiphoton quantum systems of a partially coherent light field. We shift from the realm of classical optics to the domain of quantum optics via a quantum representation of partially coherent light using its complex-Gaussian statistical properties. Our formulation of the quantum Gaussian-Schell model unveils the possibility of performing photon-number-resolving detection to isolate the constituent quantum multiphoton wavepackets of a classical light field. We experimentally verified the coherence properties of isolated vacuum systems and wavepackets with up to sixteen photons. Our findings not only demonstrate the possibility of observing quantum properties of classical macroscopic objects, but also establish a fundamental bridge between the classical and quantum worlds.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning
Authors:
Dohyeong Kim,
Mineui Hong,
Jeongho Park,
Songhwai Oh
Abstract:
In many real-world applications, a reinforcement learning (RL) agent should consider multiple objectives and adhere to safety guidelines. To address these considerations, we propose a constrained multi-objective RL algorithm named Constrained Multi-Objective Gradient Aggregator (CoMOGA). In the field of multi-objective optimization, managing conflicts between the gradients of the multiple objectiv…
▽ More
In many real-world applications, a reinforcement learning (RL) agent should consider multiple objectives and adhere to safety guidelines. To address these considerations, we propose a constrained multi-objective RL algorithm named Constrained Multi-Objective Gradient Aggregator (CoMOGA). In the field of multi-objective optimization, managing conflicts between the gradients of the multiple objectives is crucial to prevent policies from converging to local optima. It is also essential to efficiently handle safety constraints for stable training and constraint satisfaction. We address these challenges straightforwardly by treating the maximization of multiple objectives as a constrained optimization problem (COP), where the constraints are defined to improve the original objectives. Existing safety constraints are then integrated into the COP, and the policy is updated using a linear approximation, which ensures the avoidance of gradient conflicts. Despite its simplicity, CoMOGA guarantees optimal convergence in tabular settings. Through various experiments, we have confirmed that preventing gradient conflicts is critical, and the proposed method achieves constraint satisfaction across all tasks.
△ Less
Submitted 31 May, 2024; v1 submitted 29 February, 2024;
originally announced March 2024.
-
Pre-training Differentially Private Models with Limited Public Data
Authors:
Zhiqi Bu,
Xinwei Zhang,
Mingyi Hong,
Sheng Zha,
George Karypis
Abstract:
The superior performance of large foundation models relies on the use of massive amounts of high-quality data, which often contain sensitive, private and copyrighted material that requires formal protection. While differential privacy (DP) is a prominent method to gauge the degree of security provided to the models, its application is commonly limited to the model fine-tuning stage, due to the per…
▽ More
The superior performance of large foundation models relies on the use of massive amounts of high-quality data, which often contain sensitive, private and copyrighted material that requires formal protection. While differential privacy (DP) is a prominent method to gauge the degree of security provided to the models, its application is commonly limited to the model fine-tuning stage, due to the performance degradation when applying DP during the pre-training stage. Consequently, DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training process.
In this work, we first provide a theoretical understanding of the efficacy of DP training by analyzing the per-iteration loss improvement. We make a key observation that DP optimizers' performance degradation can be significantly mitigated by the use of limited public data, which leads to a novel DP continual pre-training strategy. Empirically, using only 10\% of public data, our strategy can achieve DP accuracy of 41.5\% on ImageNet-21k (with $ε=8$), as well as non-DP accuracy of 55.7\% and and 60.0\% on downstream tasks Places365 and iNaturalist-2021, respectively, on par with state-of-the-art standard pre-training and substantially outperforming existing DP pre-trained models.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Cieran: Designing Sequential Colormaps via In-Situ Active Preference Learning
Authors:
Matt-Heun Hong,
Zachary N. Sunberg,
Danielle Albers Szafir
Abstract:
Quality colormaps can help communicate important data patterns. However, finding an aesthetically pleasing colormap that looks "just right" for a given scenario requires significant design and technical expertise. We introduce Cieran, a tool that allows any data analyst to rapidly find quality colormaps while designing charts within Jupyter Notebooks. Our system employs an active preference learni…
▽ More
Quality colormaps can help communicate important data patterns. However, finding an aesthetically pleasing colormap that looks "just right" for a given scenario requires significant design and technical expertise. We introduce Cieran, a tool that allows any data analyst to rapidly find quality colormaps while designing charts within Jupyter Notebooks. Our system employs an active preference learning paradigm to rank expert-designed colormaps and create new ones from pairwise comparisons, allowing analysts who are novices in color design to tailor colormaps to their data context. We accomplish this by treating colormap design as a path planning problem through the CIELAB colorspace with a context-specific reward model. In an evaluation with twelve scientists, we found that Cieran effectively modeled user preferences to rank colormaps and leveraged this model to create new quality designs. Our work shows the potential of active preference learning for supporting efficient visualization design optimization.
△ Less
Submitted 29 February, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark
Authors:
Yihua Zhang,
**zhi Li,
Junyuan Hong,
Jiaxiang Li,
Yimeng Zhang,
Wenqing Zheng,
Pin-Yu Chen,
Jason D. Lee,
Wotao Yin,
Mingyi Hong,
Zhangyang Wang,
Sijia Liu,
Tianlong Chen
Abstract:
In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge. Addressing this issue is crucial, especially for applications…
▽ More
In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge. Addressing this issue is crucial, especially for applications like on-device training where memory efficiency is paramount. This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during LLM fine-tuning, building on the initial concept introduced by MeZO. Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques, through a comprehensive, first-of-its-kind benchmarking study across five LLM families (Roberta, OPT, LLaMA, Vicuna, Mistral), three task complexities, and five fine-tuning schemes. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance. We further introduce novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity. Our study offers a promising direction for achieving further memory-efficient LLM fine-tuning. Codes to reproduce all our experiments are at https://github.com/ZO-Bench/ZO-LLM .
△ Less
Submitted 27 May, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
Data Distribution Distilled Generative Model for Generalized Zero-Shot Recognition
Authors:
Yijie Wang,
Mingjian Hong,
Luwen Huangfu,
Sheng Huang
Abstract:
In the realm of Zero-Shot Learning (ZSL), we address biases in Generalized Zero-Shot Learning (GZSL) models, which favor seen data. To counter this, we introduce an end-to-end generative GZSL framework called D$^3$GZSL. This framework respects seen and synthesized unseen data as in-distribution and out-of-distribution data, respectively, for a more balanced model. D$^3$GZSL comprises two core modu…
▽ More
In the realm of Zero-Shot Learning (ZSL), we address biases in Generalized Zero-Shot Learning (GZSL) models, which favor seen data. To counter this, we introduce an end-to-end generative GZSL framework called D$^3$GZSL. This framework respects seen and synthesized unseen data as in-distribution and out-of-distribution data, respectively, for a more balanced model. D$^3$GZSL comprises two core modules: in-distribution dual space distillation (ID$^2$SD) and out-of-distribution batch distillation (O$^2$DBD). ID$^2$SD aligns teacher-student outcomes in embedding and label spaces, enhancing learning coherence. O$^2$DBD introduces low-dimensional out-of-distribution representations per batch sample, capturing shared structures between seen and unseen categories. Our approach demonstrates its effectiveness across established GZSL benchmarks, seamlessly integrating into mainstream generative frameworks. Extensive experiments consistently showcase that D$^3$GZSL elevates the performance of existing generative GZSL methods, underscoring its potential to refine zero-shot learning practices.The code is available at: https://github.com/PJBQ/D3GZSL.git
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Problem-Parameter-Free Decentralized Nonconvex Stochastic Optimization
Authors:
Jiaxiang Li,
Xuxing Chen,
Shiqian Ma,
Mingyi Hong
Abstract:
Existing decentralized algorithms usually require knowledge of problem parameters for updating local iterates. For example, the hyperparameters (such as learning rate) usually require the knowledge of Lipschitz constant of the global gradient or topological information of the communication networks, which are usually not accessible in practice. In this paper, we propose D-NASA, the first algorithm…
▽ More
Existing decentralized algorithms usually require knowledge of problem parameters for updating local iterates. For example, the hyperparameters (such as learning rate) usually require the knowledge of Lipschitz constant of the global gradient or topological information of the communication networks, which are usually not accessible in practice. In this paper, we propose D-NASA, the first algorithm for decentralized nonconvex stochastic optimization that requires no prior knowledge of any problem parameters. We show that D-NASA has the optimal rate of convergence for nonconvex objectives under very mild conditions and enjoys the linear-speedup effect, i.e. the computation becomes faster as the number of nodes in the system increases. Extensive numerical experiments are conducted to support our findings.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Starobinsky Inflation and beyond in Einstein-Cartan Gravity
Authors:
Minxi He,
Muzi Hong,
Kyohei Mukaida
Abstract:
We show that various types of scalaron-induced inflation, including the Starobinsky inflation, can be realized in the Einstein-Cartan gravity with the Nieh-Yan term and/or the Holst term. Einstein-Cartan $f(R)$ theory is known not to induce an additional scalar degree of freedom, the scalaron, contrary to the case in the metric formalism. However, there exist geometric quantities other than the Ri…
▽ More
We show that various types of scalaron-induced inflation, including the Starobinsky inflation, can be realized in the Einstein-Cartan gravity with the Nieh-Yan term and/or the Holst term. Einstein-Cartan $f(R)$ theory is known not to induce an additional scalar degree of freedom, the scalaron, contrary to the case in the metric formalism. However, there exist geometric quantities other than the Ricci scalar in the Einstein-Cartan gravity, such as the Nieh-Yan and the Holst terms. Once we introduce them in addition to the Ricci scalar and allow general combinations up to their quadratic order, the scalaron can become dynamical to realize inflation. With the rank of the associate matrix of the quadratic part to be one, the models are equivalent to the $α$-attractor inflation and its deformation, including the Starobinsky inflation and quadratic chaotic inflation, etc. For more general cases with the rank greater than one, the models fall into the $k$-essence, realizing the rank one case in a particular limit.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
A Survey of Recent Advances in Optimization Methods for Wireless Communications
Authors:
Ya-Feng Liu,
Tsung-Hui Chang,
Mingyi Hong,
Zheyu Wu,
Anthony Man-Cho So,
Eduard A. Jorswieck,
Wei Yu
Abstract:
Mathematical optimization is now widely regarded as an indispensable modeling and solution tool for the design of wireless communications systems. While optimization has played a significant role in the revolutionary progress in wireless communication and networking technologies from 1G to 5G and onto the future 6G, the innovations in wireless technologies have also substantially transformed the n…
▽ More
Mathematical optimization is now widely regarded as an indispensable modeling and solution tool for the design of wireless communications systems. While optimization has played a significant role in the revolutionary progress in wireless communication and networking technologies from 1G to 5G and onto the future 6G, the innovations in wireless technologies have also substantially transformed the nature of the underlying mathematical optimization problems upon which the system designs are based and have sparked significant innovations in the development of methodologies to understand, to analyze, and to solve those problems. In this paper, we provide a comprehensive survey of recent advances in mathematical optimization theory and algorithms for wireless communication system design. We begin by illustrating common features of mathematical optimization problems arising in wireless communication system design. We discuss various scenarios and use cases and their associated mathematical structures from an optimization perspective. We then provide an overview of recently developed optimization techniques in areas ranging from nonconvex optimization, global optimization, and integer programming, to distributed optimization and learning-based optimization. The key to successful solution of mathematical optimization problems is in carefully choosing or develo** suitable algorithms (or neural network architectures) that can exploit the underlying problem structure. We conclude the paper by identifying several open research challenges and outlining future research directions.
△ Less
Submitted 7 June, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
Authors:
Mao Hong,
Zhiyue Zhang,
Yue Wu,
Yanxun Xu
Abstract:
Model-based offline reinforcement learning methods (RL) have achieved state-of-the-art performance in many decision-making problems thanks to their sample efficiency and generalizability. Despite these advancements, existing model-based offline RL approaches either focus on theoretical studies without develo** practical algorithms or rely on a restricted parametric policy space, thus not fully l…
▽ More
Model-based offline reinforcement learning methods (RL) have achieved state-of-the-art performance in many decision-making problems thanks to their sample efficiency and generalizability. Despite these advancements, existing model-based offline RL approaches either focus on theoretical studies without develo** practical algorithms or rely on a restricted parametric policy space, thus not fully leveraging the advantages of an unrestricted policy space inherent to model-based methods. To address this limitation, we develop MoMA, a model-based mirror ascent algorithm with general function approximations under partial coverage of offline data. MoMA distinguishes itself from existing literature by employing an unrestricted policy class. In each iteration, MoMA conservatively estimates the value function by a minimization procedure within a confidence set of transition models in the policy evaluation step, then updates the policy with general function approximations instead of commonly-used parametric policy classes in the policy improvement step. Under some mild assumptions, we establish theoretical guarantees of MoMA by proving an upper bound on the suboptimality of the returned policy. We also provide a practically implementable, approximate version of the algorithm. The effectiveness of MoMA is demonstrated via numerical studies.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Authors:
Kaan Ozkara,
Can Karakus,
Parameswaran Raman,
Mingyi Hong,
Shoham Sabach,
Branislav Kveton,
Volkan Cevher
Abstract:
Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during tra…
▽ More
Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during training. The key idea in MADA is to parameterize the space of optimizers and dynamically search through it using hyper-gradient descent during training. We empirically compare MADA to other popular optimizers on vision and language tasks, and find that MADA consistently outperforms Adam and other popular optimizers, and is robust against sub-optimally tuned hyper-parameters. MADA achieves a greater validation performance improvement over Adam compared to other popular optimizers during GPT-2 training and fine-tuning. We also propose AVGrad, a modification of AMSGrad that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization. Finally, we provide a convergence analysis to show that parameterized interpolations of optimizers can improve their error bounds (up to constants), hinting at an advantage for meta-optimizers.
△ Less
Submitted 17 June, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
SynHING: Synthetic Heterogeneous Information Network Generation for Graph Learning and Explanation
Authors:
Ming-Yi Hong,
Yi-Hsiang Huang,
Shao-En Lin,
You-Chen Teng,
Chih-Yu Wang,
Che Lin
Abstract:
Graph Neural Networks (GNNs) excel in delineating graph structures in diverse domains, including community analysis and recommendation systems. As the interpretation of GNNs becomes increasingly important, the demand for robust baselines and expansive graph datasets is accentuated, particularly in the context of Heterogeneous Information Networks (HIN). Addressing this, we introduce SynHING, a nov…
▽ More
Graph Neural Networks (GNNs) excel in delineating graph structures in diverse domains, including community analysis and recommendation systems. As the interpretation of GNNs becomes increasingly important, the demand for robust baselines and expansive graph datasets is accentuated, particularly in the context of Heterogeneous Information Networks (HIN). Addressing this, we introduce SynHING, a novel framework for Synthetic Heterogeneous Information Network Generation aimed at enhancing graph learning and explanation. SynHING systematically identifies major motifs in a target HIN and employs a bottom-up generation process with intra-cluster and inter-cluster merge modules. This process, supplemented by post-pruning techniques, ensures the synthetic HIN closely mirrors the original graph's structural and statistical properties. Crucially, SynHING provides ground-truth motifs for evaluating GNN explainer models, setting a new standard for explainable, synthetic HIN generation and contributing to the advancement of interpretable machine learning in complex networks.
△ Less
Submitted 29 May, 2024; v1 submitted 6 January, 2024;
originally announced January 2024.
-
Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate
Authors:
Ruichen Jiang,
Parameswaran Raman,
Shoham Sabach,
Aryan Mokhtari,
Mingyi Hong,
Volkan Cevher
Abstract:
Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods.…
▽ More
Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods. However, the majority of existing subspace second-order methods randomly select subspaces, consequently resulting in slower convergence rates depending on the problem's dimension $d$. In this paper, we introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of ${O}\left(\frac{1}{mk}+\frac{1}{k^2}\right)$ for solving convex optimization problems. Here, $m$ represents the subspace dimension, which can be significantly smaller than $d$. Instead of adopting a random subspace, our primary innovation involves performing the cubic regularized Newton update within the Krylov subspace associated with the Hessian and the gradient of the objective function. This result marks the first instance of a dimension-independent convergence rate for a subspace second-order method. Furthermore, when specific spectral conditions of the Hessian are met, our method recovers the convergence rate of a full-dimensional cubic regularized Newton method. Numerical experiments show our method converges faster than existing random subspace methods, especially for high-dimensional problems.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
BioSpark: An End-to-End Generative System for Biological-Analogical Inspirations and Ideation
Authors:
Hyeonsu B. Kang,
David Chuan-En Lin,
Nikolas Martelaro,
Aniket Kittur,
Yan-Ying Chen,
Matthew K. Hong
Abstract:
Nature is often used to inspire solutions for complex engineering problems, but achieving its full potential is challenging due to difficulties in discovering relevant analogies and synthesizing from them. Here, we present an end-to-end system, BioSpark, that generates biological-analogical mechanisms and provides an interactive interface to comprehend and synthesize from them. BioSpark pipeline s…
▽ More
Nature is often used to inspire solutions for complex engineering problems, but achieving its full potential is challenging due to difficulties in discovering relevant analogies and synthesizing from them. Here, we present an end-to-end system, BioSpark, that generates biological-analogical mechanisms and provides an interactive interface to comprehend and synthesize from them. BioSpark pipeline starts with a small seed set of mechanisms and expands it using an iteratively constructed taxonomic hierarchies, overcoming data sparsity in manual expert curation and limited conceptual diversity in automated analogy generation via LLMs. The interface helps designers with recognizing and understanding relevant analogs to design problems using four main interaction features. We evaluate the biological-analogical mechanism generation pipeline and showcase the value of BioSpark through case studies. We end with discussion and implications for future work in this area.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
A GAN Approach for Node Embedding in Heterogeneous Graphs Using Subgraph Sampling
Authors:
Hung Chun Hsu,
Bo-Jun Wu,
Ming-Yi Hong,
Che Lin,
Chih-Yu Wang
Abstract:
Our research addresses class imbalance issues in heterogeneous graphs using graph neural networks (GNNs). We propose a novel method combining the strengths of Generative Adversarial Networks (GANs) with GNNs, creating synthetic nodes and edges that effectively balance the dataset. This approach directly targets and rectifies imbalances at the data level. The proposed framework resolves issues such…
▽ More
Our research addresses class imbalance issues in heterogeneous graphs using graph neural networks (GNNs). We propose a novel method combining the strengths of Generative Adversarial Networks (GANs) with GNNs, creating synthetic nodes and edges that effectively balance the dataset. This approach directly targets and rectifies imbalances at the data level. The proposed framework resolves issues such as neglecting graph structures during data generation and creating synthetic structures usable with GNN-based classifiers in downstream tasks. It processes node and edge information concurrently, improving edge balance through node augmentation and subgraph sampling. Additionally, our framework integrates a threshold strategy, aiding in determining optimal edge thresholds during training without time-consuming parameter adjustments. Experiments on the Amazon and Yelp Review datasets highlight the effectiveness of the framework we proposed, especially in minority node identification, where it consistently outperforms baseline models across key performance metrics, demonstrating its potential in the field.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Diffused Task-Agnostic Milestone Planner
Authors:
Mineui Hong,
Minjae Kang,
Songhwai Oh
Abstract:
Addressing decision-making problems using sequence modeling to predict future trajectories shows promising results in recent years. In this paper, we take a step further to leverage the sequence predictive method in wider areas such as long-term planning, vision-based control, and multi-task decision-making. To this end, we propose a method to utilize a diffusion-based generative sequence model to…
▽ More
Addressing decision-making problems using sequence modeling to predict future trajectories shows promising results in recent years. In this paper, we take a step further to leverage the sequence predictive method in wider areas such as long-term planning, vision-based control, and multi-task decision-making. To this end, we propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space and to have an agent to follow the milestones to accomplish a given task. The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control. Furthermore, our approach exploits generation flexibility of the diffusion model, which makes it possible to plan diverse trajectories for multi-task decision-making. We demonstrate the proposed method across offline reinforcement learning (RL) benchmarks and an visual manipulation environment. The results show that our approach outperforms offline RL methods in solving long-horizon, sparse-reward tasks and multi-task problems, while also achieving the state-of-the-art performance on the most challenging vision-based manipulation benchmark.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Differentially Private SGD Without Clip** Bias: An Error-Feedback Approach
Authors:
Xinwei Zhang,
Zhiqi Bu,
Zhiwei Steven Wu,
Mingyi Hong
Abstract:
Differentially Private Stochastic Gradient Descent with Gradient Clip** (DPSGD-GC) is a powerful tool for training deep learning models using sensitive data, providing both a solid theoretical privacy guarantee and high efficiency. However, using DPSGD-GC to ensure Differential Privacy (DP) comes at the cost of model performance degradation due to DP noise injection and gradient clip**. Existi…
▽ More
Differentially Private Stochastic Gradient Descent with Gradient Clip** (DPSGD-GC) is a powerful tool for training deep learning models using sensitive data, providing both a solid theoretical privacy guarantee and high efficiency. However, using DPSGD-GC to ensure Differential Privacy (DP) comes at the cost of model performance degradation due to DP noise injection and gradient clip**. Existing research has extensively analyzed the theoretical convergence of DPSGD-GC, and has shown that it only converges when using large clip** thresholds that are dependent on problem-specific parameters. Unfortunately, these parameters are often unknown in practice, making it hard to choose the optimal clip** threshold. Therefore, in practice, DPSGD-GC suffers from degraded performance due to the {\it constant} bias introduced by the clip**.
In our work, we propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC, which not only offers a diminishing utility bound without inducing a constant clip** bias, but more importantly, it allows for an arbitrary choice of clip** threshold that is independent of the problem. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R{é}nyi DP. Additionally, we demonstrate that under mild conditions, our algorithm can achieve nearly the same utility bound as DPSGD without gradient clip**. Our empirical results on Cifar-10/100 and E2E datasets, show that the proposed algorithm achieves higher accuracies than DPSGD while maintaining the same level of DP guarantee.
△ Less
Submitted 17 April, 2024; v1 submitted 24 November, 2023;
originally announced November 2023.
-
Conversational AI Threads for Visualizing Multidimensional Datasets
Authors:
Matt-Heun Hong,
Anamaria Crisan
Abstract:
Generative Large Language Models (LLMs) show potential in data analysis, yet their full capabilities remain uncharted. Our work explores the capabilities of LLMs for creating and refining visualizations via conversational interfaces. We used an LLM to conduct a re-analysis of a prior Wizard-of-Oz study examining the use of chatbots for conducting visual analysis. We surfaced the strengths and weak…
▽ More
Generative Large Language Models (LLMs) show potential in data analysis, yet their full capabilities remain uncharted. Our work explores the capabilities of LLMs for creating and refining visualizations via conversational interfaces. We used an LLM to conduct a re-analysis of a prior Wizard-of-Oz study examining the use of chatbots for conducting visual analysis. We surfaced the strengths and weaknesses of LLM-driven analytic chatbots, finding that they fell short in supporting progressive visualization refinements. From these findings, we developed AI Threads, a multi-threaded analytic chatbot that enables analysts to proactively manage conversational context and improve the efficacy of its outputs. We evaluate its usability through a crowdsourced study (n=40) and in-depth interviews with expert analysts (n=10). We further demonstrate the capabilities of AI Threads on a dataset outside the LLM's training corpus. Our findings show the potential of LLMs while also surfacing challenges and fruitful avenues for future research.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Demystifying Poisoning Backdoor Attacks from a Statistical Perspective
Authors:
Ganghua Wang,
Xun Xian,
Jayanth Srinivasa,
Ashish Kundu,
Xuan Bi,
Mingyi Hong,
Jie Ding
Abstract:
The growing dependence on machine learning in real-world applications emphasizes the importance of understanding and ensuring its safety. Backdoor attacks pose a significant security risk due to their stealthy nature and potentially serious consequences. Such attacks involve embedding triggers within a learning model with the intention of causing malicious behavior when an active trigger is presen…
▽ More
The growing dependence on machine learning in real-world applications emphasizes the importance of understanding and ensuring its safety. Backdoor attacks pose a significant security risk due to their stealthy nature and potentially serious consequences. Such attacks involve embedding triggers within a learning model with the intention of causing malicious behavior when an active trigger is present while maintaining regular functionality without it. This paper evaluates the effectiveness of any backdoor attack incorporating a constant trigger, by establishing tight lower and upper boundaries for the performance of the compromised model on both clean and backdoor test data. The developed theory answers a series of fundamental but previously underexplored problems, including (1) what are the determining factors for a backdoor attack's success, (2) what is the direction of the most effective backdoor attack, and (3) when will a human-imperceptible trigger succeed. Our derived understanding applies to both discriminative and generative models. We also demonstrate the theory by conducting experiments using benchmark datasets and state-of-the-art backdoor attack scenarios.
△ Less
Submitted 17 October, 2023; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning
Authors:
Yihua Zhang,
Yimeng Zhang,
Aochuan Chen,
**ghan Jia,
Jiancheng Liu,
Gaowen Liu,
Mingyi Hong,
Shiyu Chang,
Sijia Liu
Abstract:
Massive data is often considered essential for deep learning applications, but it also incurs significant computational and infrastructural costs. Therefore, dataset pruning (DP) has emerged as an effective way to improve data efficiency by identifying and removing redundant training samples without sacrificing performance. In this work, we aim to address the problem of DP for transfer learning, i…
▽ More
Massive data is often considered essential for deep learning applications, but it also incurs significant computational and infrastructural costs. Therefore, dataset pruning (DP) has emerged as an effective way to improve data efficiency by identifying and removing redundant training samples without sacrificing performance. In this work, we aim to address the problem of DP for transfer learning, i.e., how to prune a source dataset for improved pretraining efficiency and lossless finetuning accuracy on downstream target tasks. To our best knowledge, the problem of DP for transfer learning remains open, as previous studies have primarily addressed DP and transfer learning as separate problems. By contrast, we establish a unified viewpoint to integrate DP with transfer learning and find that existing DP methods are not suitable for the transfer learning paradigm. We then propose two new DP methods, label map** and feature map**, for supervised and self-supervised pretraining settings respectively, by revisiting the DP problem through the lens of source-target domain map**. Furthermore, we demonstrate the effectiveness of our approach on numerous transfer learning tasks. We show that source data classes can be pruned by up to 40% ~ 80% without sacrificing downstream performance, resulting in a significant 2 ~ 5 times speed-up during the pretraining stage. Besides, our proposal exhibits broad applicability and can improve other computationally intensive transfer learning techniques, such as adversarial pretraining. Codes are available at https://github.com/OPTML-Group/DP4TL.
△ Less
Submitted 18 November, 2023; v1 submitted 12 October, 2023;
originally announced October 2023.
-
A Bayesian Approach to Robust Inverse Reinforcement Learning
Authors:
Ran Wei,
Siliang Zeng,
Chenliang Li,
Alfredo Garcia,
Anthony McDonald,
Mingyi Hong
Abstract:
We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL). The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and subjective model of environment dynamics. We make use of a class of prior distributions which parameterizes how accurate the expert's model of the enviro…
▽ More
We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL). The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and subjective model of environment dynamics. We make use of a class of prior distributions which parameterizes how accurate the expert's model of the environment is to develop efficient algorithms to estimate the expert's reward and subjective dynamics in high-dimensional settings. Our analysis reveals a novel insight that the estimated policy exhibits robust performance when the expert is believed (a priori) to have a highly accurate model of the environment. We verify this observation in the MuJoCo environments and show that our algorithms outperform state-of-the-art offline IRL algorithms.
△ Less
Submitted 6 April, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Quartic Gradient Flow
Authors:
Muzi Hong,
Ryusuke **no
Abstract:
Saddle-point configurations, such as the Euclidean bounce and sphalerons, are known to be difficult to find numerically. In this Letter we study a new method, Quartic Gradient Flow, to search for such configurations. The central idea is to introduce a gradient-flow-like equation in such a way that all the fluctuations around the saddle-point have eigenvalues that are square of the eigenvalues of t…
▽ More
Saddle-point configurations, such as the Euclidean bounce and sphalerons, are known to be difficult to find numerically. In this Letter we study a new method, Quartic Gradient Flow, to search for such configurations. The central idea is to introduce a gradient-flow-like equation in such a way that all the fluctuations around the saddle-point have eigenvalues that are square of the eigenvalues of the original quadratic operator. We illustrate how the method works for the Euclidean bounce and sphalerons.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
An Introduction to Bi-level Optimization: Foundations and Applications in Signal Processing and Machine Learning
Authors:
Yihua Zhang,
Prashant Khanduri,
Ioannis Tsaknakis,
Yuguang Yao,
Mingyi Hong,
Sijia Liu
Abstract:
Recently, bi-level optimization (BLO) has taken center stage in some very exciting developments in the area of signal processing (SP) and machine learning (ML). Roughly speaking, BLO is a classical optimization problem that involves two levels of hierarchy (i.e., upper and lower levels), wherein obtaining the solution to the upper-level problem requires solving the lower-level one. BLO has become…
▽ More
Recently, bi-level optimization (BLO) has taken center stage in some very exciting developments in the area of signal processing (SP) and machine learning (ML). Roughly speaking, BLO is a classical optimization problem that involves two levels of hierarchy (i.e., upper and lower levels), wherein obtaining the solution to the upper-level problem requires solving the lower-level one. BLO has become popular largely because it is powerful in modeling problems in SP and ML, among others, that involve optimizing nested objective functions. Prominent applications of BLO range from resource allocation for wireless systems to adversarial machine learning. In this work, we focus on a class of tractable BLO problems that often appear in SP and ML applications. We provide an overview of some basic concepts of this class of BLO problems, such as their optimality conditions, standard algorithms (including their optimization principles and practical implementations), as well as how they can be leveraged to obtain state-of-the-art results for a number of key SP and ML applications. Further, we discuss some recent advances in BLO theory, its implications for applications, and point out some limitations of the state-of-the-art that require significant future research efforts. Overall, we hope that this article can serve to accelerate the adoption of BLO as a generic tool to model, analyze, and innovate on a wide array of emerging SP and ML applications.
△ Less
Submitted 20 December, 2023; v1 submitted 1 August, 2023;
originally announced August 2023.
-
MolFM: A Multimodal Molecular Foundation Model
Authors:
Yizhen Luo,
Kai Yang,
Massimo Hong,
Xing Yi Liu,
Zaiqing Nie
Abstract:
Molecular knowledge resides within three different modalities of information sources: molecular structures, biomedical documents, and knowledge bases. Effective incorporation of molecular knowledge from these modalities holds paramount significance in facilitating biomedical research. However, existing multimodal molecular foundation models exhibit limitations in capturing intricate connections be…
▽ More
Molecular knowledge resides within three different modalities of information sources: molecular structures, biomedical documents, and knowledge bases. Effective incorporation of molecular knowledge from these modalities holds paramount significance in facilitating biomedical research. However, existing multimodal molecular foundation models exhibit limitations in capturing intricate connections between molecular structures and texts, and more importantly, none of them attempt to leverage a wealth of molecular expertise derived from knowledge graphs. In this study, we introduce MolFM, a multimodal molecular foundation model designed to facilitate joint representation learning from molecular structures, biomedical texts, and knowledge graphs. We propose cross-modal attention between atoms of molecular structures, neighbors of molecule entities and semantically related texts to facilitate cross-modal comprehension. We provide theoretical analysis that our cross-modal pre-training captures local and global molecular knowledge by minimizing the distance in the feature space between different modalities of the same molecule, as well as molecules sharing similar structures or functions. MolFM achieves state-of-the-art performance on various downstream tasks. On cross-modal retrieval, MolFM outperforms existing models with 12.13% and 5.04% absolute gains under the zero-shot and fine-tuning settings, respectively. Furthermore, qualitative analysis showcases MolFM's implicit ability to provide grounding from molecular substructures and knowledge graphs. Code and models are available on https://github.com/BioFM/OpenBioMed.
△ Less
Submitted 21 July, 2023; v1 submitted 6 June, 2023;
originally announced July 2023.
-
Next Steps for Human-Centered Generative AI: A Technical Perspective
Authors:
Xiang 'Anthony' Chen,
Jeff Burke,
Ruofei Du,
Matthew K. Hong,
Jennifer Jacobs,
Philippe Laban,
Dingzeyu Li,
Nanyun Peng,
Karl D. D. Willis,
Chien-Sheng Wu,
Bolei Zhou
Abstract:
Through iterative, cross-disciplinary discussions, we define and propose next-steps for Human-centered Generative AI (HGAI). We contribute a comprehensive research agenda that lays out future directions of Generative AI spanning three levels: aligning with human values; assimilating human intents; and augmenting human abilities. By identifying these next-steps, we intend to draw interdisciplinary…
▽ More
Through iterative, cross-disciplinary discussions, we define and propose next-steps for Human-centered Generative AI (HGAI). We contribute a comprehensive research agenda that lays out future directions of Generative AI spanning three levels: aligning with human values; assimilating human intents; and augmenting human abilities. By identifying these next-steps, we intend to draw interdisciplinary research teams to pursue a coherent set of emergent ideas in HGAI, focusing on their interested topics while maintaining a coherent big picture of the future work landscape.
△ Less
Submitted 22 December, 2023; v1 submitted 27 June, 2023;
originally announced June 2023.
-
Illuminating all-hadronic final states with a photon: Exotic decays of the Higgs boson to four bottom quarks in vector boson fusion plus gamma at hadron colliders
Authors:
Stephen T. Roche,
Benjamin T. Carlson,
Christopher R. Hayes,
Tae Min Hong
Abstract:
We investigate the potential to detect Higgs boson decays to four bottom quarks through a pair of pseudoscalars, a final state that is predicted by many theories beyond the Standard Model. For the first time, the signal sensitivity is evaluated for the final state using the vector boson fusion (VBF) production with and without an associated photon, for the Higgs at $m_H=125\,\textrm{GeV}$, at hadr…
▽ More
We investigate the potential to detect Higgs boson decays to four bottom quarks through a pair of pseudoscalars, a final state that is predicted by many theories beyond the Standard Model. For the first time, the signal sensitivity is evaluated for the final state using the vector boson fusion (VBF) production with and without an associated photon, for the Higgs at $m_H=125\,\textrm{GeV}$, at hadron colliders. The signal significance is $4$ to $6σ$, depending on the pseudoscalar mass $m_a$, when setting the the Higgs decay branching ratio to unity, using an integrated luminosity of $150\,\textrm{fb}^{-1}$ at $\sqrt{s}=13\,\textrm{TeV}$. This corresponds to an upper limit of $0.3$, on the Higgs branching ratio to four bottom quarks, with a non-observation of the decay. We also consider several variations of selection requirements - input variables for the VBF tagging and the kinematic variables for the photon - that could help guide the design of new triggers for the Run-3 period of the LHC and for the HL-LHC.
△ Less
Submitted 28 June, 2024; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Generative AI for Product Design: Getting the Right Design and the Design Right
Authors:
Matthew K. Hong,
Shabnam Hakimi,
Yan-Ying Chen,
Heishiro Toyoda,
Charlene Wu,
Matt Klenk
Abstract:
Generative AI (GenAI) models excel in their ability to recognize patterns in existing data and generate new and unexpected content. Recent advances have motivated applications of GenAI tools (e.g., Stable Diffusion, ChatGPT) to professional practice across industries, including product design. While these generative capabilities may seem enticing on the surface, certain barriers limit their practi…
▽ More
Generative AI (GenAI) models excel in their ability to recognize patterns in existing data and generate new and unexpected content. Recent advances have motivated applications of GenAI tools (e.g., Stable Diffusion, ChatGPT) to professional practice across industries, including product design. While these generative capabilities may seem enticing on the surface, certain barriers limit their practical application for real-world use in industry settings. In this position paper, we articulate and situate these barriers within two phases of the product design process, namely "getting the right design" and "getting the design right," and propose a research agenda to stimulate discussions around opportunities for realizing the full potential of GenAI tools in product design.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
A Policy Gradient Method for Confounded POMDPs
Authors:
Mao Hong,
Zhengling Qi,
Yanxun Xu
Abstract:
In this paper, we propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of c…
▽ More
In this paper, we propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of conditional moment restrictions and adopt the min-max learning procedure with general function approximation for estimating the policy gradient. We then provide a finite-sample non-asymptotic bound for estimating the gradient uniformly over a pre-specified policy class in terms of the sample size, length of horizon, concentratability coefficient and the measure of ill-posedness in solving the conditional moment restrictions. Lastly, by deploying the proposed gradient estimation in the gradient ascent algorithm, we show the global convergence of the proposed algorithm in finding the history-dependent optimal policy under some technical conditions. To the best of our knowledge, this is the first work studying the policy gradient method for POMDPs under the offline setting.
△ Less
Submitted 30 November, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Limit theorems for additive functionals of some self-similar Gaussian processes
Authors:
Minhao Hong,
Heguang Liu,
Fangjun Xu
Abstract:
Under certain mild conditions, limit theorems for additive functionals of some $d$-dimensional self-similar Gaussian processes are obtained. These limit theorems work for general Gaussian processes including fractional Brownian motions, sub-fractional Brownian motions and bi-fractional Brownian motions. To prove these results, we use the method of moments and an enhanced chaining argument. The Gau…
▽ More
Under certain mild conditions, limit theorems for additive functionals of some $d$-dimensional self-similar Gaussian processes are obtained. These limit theorems work for general Gaussian processes including fractional Brownian motions, sub-fractional Brownian motions and bi-fractional Brownian motions. To prove these results, we use the method of moments and an enhanced chaining argument. The Gaussian processes under consideration are required to satisfy certain strong local nondeterminism property. A tractable sufficient condition for the strong local nondeterminism property is given and it only relays on the covariance functions of the Gaussian processes. Moreover, we give a sufficient condition for the distribution function of a random vector to be determined by its moments.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Conservative Physics-Informed Neural Networks for Non-Conservative Hyperbolic Conservation Laws Near Critical States
Authors:
Reyna Quita,
Yu-Shuo Chen,
Hsin-Yi Lee Alex C. Hu,
John M. Hong
Abstract:
In this paper, a modified version of conservative Physics-informed Neural Networks (cPINN for short) is provided to construct the weak solutions of Riemann problem for the hyperbolic scalar conservation laws in non-conservative form. To demonstrate the results, we use the model of generalized Buckley-Leverett equation (GBL equation for short) with discontinuous porosity in porous media. By inventi…
▽ More
In this paper, a modified version of conservative Physics-informed Neural Networks (cPINN for short) is provided to construct the weak solutions of Riemann problem for the hyperbolic scalar conservation laws in non-conservative form. To demonstrate the results, we use the model of generalized Buckley-Leverett equation (GBL equation for short) with discontinuous porosity in porous media. By inventing a new unknown, the GBL equation is transformed into a two-by-two resonant hyperbolic conservation laws in conservative form. The modified method of cPINN is invented to overcome the difficulties due to the discontinuity of the porosity and the appearance of the critical states (near vacuum) in the Riemann data. We experiment with our idea by using a deep learning algorithm to solve the GBL equation in both conservative and non-conservative forms, as well as the cases of critical and non-critical states. This method provides a combination of two different neural networks and corresponding loss functions, one is for the two-by-two resonant hyperbolic system, and the other is for the scalar conservation law with a discontinuous perturbation term in the non-convex flux. The technique of re-scaling to the unknowns is adopted to avoid the oscillation of the Riemann solutions in the cases of critical Riemann data. The solutions constructed by the modified cPINN match the exact solutions constructed by the theoretical analysis for hyperbolic conservation laws. In addition, the solutions are identical in both conservative and non-conservative cases. Finally, we compare the performance of the modified cPINN with numerical method called WENO5. Whereas WENO5 struggles with the highly oscillation of approximate solutions for the Riemann problems of GBL equation in non-conservative form, cPINN works admirably.
△ Less
Submitted 22 May, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens
Authors:
Zhanpeng Zeng,
Cole Hawkins,
Mingyi Hong,
Aston Zhang,
Nikolaos Pappas,
Vikas Singh,
Shuai Zheng
Abstract:
Transformers are central in modern natural language processing and computer vision applications. Despite recent works devoted to reducing the quadratic cost of such models (as a function of the sequence length), dealing with ultra long sequences (e.g., with more than 16K tokens) remains challenging. Applications such as answering questions based on a book or summarizing a scientific article are in…
▽ More
Transformers are central in modern natural language processing and computer vision applications. Despite recent works devoted to reducing the quadratic cost of such models (as a function of the sequence length), dealing with ultra long sequences (e.g., with more than 16K tokens) remains challenging. Applications such as answering questions based on a book or summarizing a scientific article are inefficient or infeasible. Here, we propose to significantly improve the efficiency of Transformers for ultra long sequences, by compressing the sequence into a much smaller representation at each layer. Specifically, by exploiting the fact that in many tasks, only a small subset of special tokens (we call VIP-tokens) are most relevant to the final prediction, we propose a VIP-token centric compression (VCC) scheme which selectively compresses the sequence based on their impact on approximating the representation of the VIP-tokens. Compared with competitive baselines, our algorithm is not only efficient (achieving more than $3\times$ efficiency gain compared to baselines on 4K and 16K lengths), but also offers competitive/better performance on a large number of tasks. Further, we show that our algorithm scales to 128K tokens (or more) while consistently offering accuracy improvement.
△ Less
Submitted 27 May, 2023; v1 submitted 7 May, 2023;
originally announced May 2023.
-
Towards Unified AI Drug Discovery with Multiple Knowledge Modalities
Authors:
Yizhen Luo,
Xing Yi Liu,
Kai Yang,
Kui Huang,
Massimo Hong,
Jiahuan Zhang,
Yushuai Wu,
Zaiqing Nie
Abstract:
In recent years, AI models that mine intrinsic patterns from molecular structures and protein sequences have shown promise in accelerating drug discovery. However, these methods partly lag behind real-world pharmaceutical approaches of human experts that additionally grasp structured knowledge from knowledge bases and unstructured knowledge from biomedical literature. To bridge this gap, we propos…
▽ More
In recent years, AI models that mine intrinsic patterns from molecular structures and protein sequences have shown promise in accelerating drug discovery. However, these methods partly lag behind real-world pharmaceutical approaches of human experts that additionally grasp structured knowledge from knowledge bases and unstructured knowledge from biomedical literature. To bridge this gap, we propose KEDD, a unified, end-to-end, and multimodal deep learning framework that optimally incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first extracts underlying characteristics from heterogeneous inputs, and then applies multimodal fusion for accurate prediction. To mitigate the problem of missing modalities, we leverage multi-head sparse attention and a modality masking mechanism to extract relevant information robustly. Benefiting from integrated knowledge, our framework achieves a deeper understanding of molecule entities, brings significant improvements over state-of-the-art methods on a wide range of tasks and benchmarks, and reveals its promising potential in assisting real-world drug discovery.
△ Less
Submitted 14 October, 2023; v1 submitted 17 April, 2023;
originally announced May 2023.
-
Baryogenesis from sphaleron decoupling
Authors:
Muzi Hong,
Kohei Kamada,
Jun'ichi Yokoyama
Abstract:
The electroweak sphaleron process breaks the baryon number conservation within the realms of the Standard Model of particle physics (SM). Recently, it is pointed out that its decoupling may provide the out-of-equilibrium condition required for baryogenesis. In this paper, we study such a scenario taking into account the baryon-number wash-out effect of the sphaleron itself to improve the estimate.…
▽ More
The electroweak sphaleron process breaks the baryon number conservation within the realms of the Standard Model of particle physics (SM). Recently, it is pointed out that its decoupling may provide the out-of-equilibrium condition required for baryogenesis. In this paper, we study such a scenario taking into account the baryon-number wash-out effect of the sphaleron itself to improve the estimate. We clarify the amount of CP violation required for this scenario to explain the observed asymmetry.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.