Skip to main content

Showing 1–50 of 111 results for author: Kawaguchi, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14095  [pdf, other

    cs.LG cs.AI

    Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization

    Authors: Qianli Shen, Yezhen Wang, Zhouhao Yang, Xiang Li, Haonan Wang, Yang Zhang, Jonathan Scarlett, Zhanxing Zhu, Kenji Kawaguchi

    Abstract: Bi-level optimization (BO) has become a fundamental mathematical framework for addressing hierarchical machine learning problems. As deep learning models continue to grow in size, the demand for scalable bi-level optimization solutions has become increasingly critical. Traditional gradient-based bi-level optimization algorithms, due to their inherent characteristics, are ill-suited to meet the dem… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.11708  [pdf, ps, other

    math.NA cs.LG math.DS

    Tackling the Curse of Dimensionality in Fractional and Tempered Fractional PDEs with Physics-Informed Neural Networks

    Authors: Zheyuan Hu, Kenji Kawaguchi, Zhongqiang Zhang, George Em Karniadakis

    Abstract: Fractional and tempered fractional partial differential equations (PDEs) are effective models of long-range interactions, anomalous diffusion, and non-local effects. Traditional numerical methods for these problems are mesh-based, thus struggling with the curse of dimensionality (CoD). Physics-informed neural networks (PINNs) offer a promising solution due to their universal approximation, general… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 15 pages

    ACM Class: F.2.2; I.2.7

  3. arXiv:2406.11676  [pdf, other

    cs.LG math.DS math.NA stat.ML

    Score-fPINN: Fractional Score-Based Physics-Informed Neural Networks for High-Dimensional Fokker-Planck-Levy Equations

    Authors: Zheyuan Hu, Zhongqiang Zhang, George Em Karniadakis, Kenji Kawaguchi

    Abstract: We introduce an innovative approach for solving high-dimensional Fokker-Planck-Lévy (FPL) equations in modeling non-Brownian processes across disciplines such as physics, finance, and ecology. We utilize a fractional score function and Physical-informed neural networks (PINN) to lift the curse of dimensionality (CoD) and alleviate numerical overflow from exponentially decaying solutions with dimen… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 16 pages, 1 figure

    ACM Class: F.2.2; I.2.7

  4. arXiv:2406.06793  [pdf, other

    cs.LG cs.AI

    PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

    Authors: Chang Chen, Junyeob Baek, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sung** Ahn

    Abstract: Despite the recent advancements in offline RL, no unified algorithm could achieve superior performance across a broad range of tasks. Offline \textit{value function learning}, in particular, struggles with sparse-reward, long-horizon tasks due to the difficulty of solving credit assignment and extrapolation errors that accumulates as the horizon of the task grows.~On the other hand, models that ca… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  5. arXiv:2406.02847  [pdf, other

    cs.LG stat.ML

    Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers

    Authors: Brian K Chen, Tianyang Hu, Hui **, Hwee Kuan Lee, Kenji Kawaguchi

    Abstract: In-Context Learning (ICL) has been a powerful emergent property of large language models that has attracted increasing attention in recent years. In contrast to regular gradient-based learning, ICL is highly interpretable and does not require parameter updates. In this paper, we show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias ter… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  6. arXiv:2405.18540  [pdf, other

    cs.CL cs.CR cs.LG

    Learning diverse attacks on large language models for robust red-teaming and safety tuning

    Authors: Seanie Lee, Minsu Kim, Lynn Cherif, David Dobre, Juho Lee, Sung Ju Hwang, Kenji Kawaguchi, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Moksh Jain

    Abstract: Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs). Develo** effective protection against many modes of attack prompts requires discovering diverse attacks. Automated red-teaming typically uses reinforcement learning to fine-tune an attacker language model to generate prompts that e… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  7. arXiv:2405.18218  [pdf, other

    cs.LG

    FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models

    Authors: Yang Zhang, Yawei Li, Xinpeng Wang, Qianli Shen, Barbara Plank, Bernd Bischl, Mina Rezaei, Kenji Kawaguchi

    Abstract: Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs). However, such models contain billions of parameters making large compute a necessity, while raising environmental concerns. To address these issues, we propose FinerCut, a new form of fine-grained layer pruning, which in contrast to prior work at the transformer block level, considers all… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 22 pages

  8. arXiv:2405.14225  [pdf, other

    q-bio.QM cs.CL cs.MM

    ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining

    Authors: Zhiyuan Liu, Yaorui Shi, An Zhang, Sihang Li, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

    Abstract: Molecule-text modeling, which aims to facilitate molecule-relevant tasks with a textual interface and textual knowledge, is an emerging research direction. Beyond single molecules, studying reaction-text modeling holds promise for hel** the synthesis of new materials and drugs. However, previous works mostly neglect reaction-text modeling: they primarily focus on modeling individual molecule-tex… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: ACL 2024 Findings, 9 pages

  9. arXiv:2405.12564  [pdf, other

    q-bio.QM cs.CL cs.MM

    ProtT3: Protein-to-Text Generation for Text-based Protein Understanding

    Authors: Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

    Abstract: Language Models (LMs) excel in understanding textual descriptions of proteins, as evident in biomedical question-answering tasks. However, their capability falters with raw protein data, such as amino acid sequences, due to a deficit in pretraining on such data. Conversely, Protein Language Models (PLMs) can understand and convert protein data into high-quality representations, but struggle to pro… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: ACL 2024, 9 pages

  10. arXiv:2405.00451  [pdf, other

    cs.AI cs.LG

    Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

    Authors: Yuxi Xie, Anirudh Goyal, Wenyue Zheng, Min-Yen Kan, Timothy P. Lillicrap, Kenji Kawaguchi, Michael Shieh

    Abstract: We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level… ▽ More

    Submitted 17 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 10 pages, 4 figures, 4 tables (24 pages, 9 figures, 9 tables including references and appendices)

  11. arXiv:2404.13904  [pdf, other

    cs.LG cs.CV

    Deep Regression Representation Learning with Topology

    Authors: Shihao Zhang, kenji kawaguchi, Angela Yao

    Abstract: Most works studying representation learning focus only on classification and neglect regression. Yet, the learning objectives and, therefore, the representation topologies of the two tasks are fundamentally different: classification targets class separation, leading to disconnected representations, whereas regression requires ordinality with respect to the target, leading to continuous representat… ▽ More

    Submitted 16 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: ICML 2024

  12. arXiv:2403.06392  [pdf, other

    cs.LG

    Towards Robust Out-of-Distribution Generalization Bounds via Sharpness

    Authors: Yingtian Zou, Kenji Kawaguchi, Yingnan Liu, Jiashuo Liu, Mong-Li Lee, Wynne Hsu

    Abstract: Generalizing to out-of-distribution (OOD) data or unseen domain, termed OOD generalization, still lacks appropriate theoretical guarantees. Canonical OOD bounds focus on different distance measurements between source and target domains but fail to consider the optimization property of the learned model. As empirically shown in recent work, the sharpness of learned minima influences OOD generalizat… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 40 pages, 9 figures, ICLR 2024 Spotlight Presentation

  13. arXiv:2403.06381  [pdf, other

    cs.CV

    Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models

    Authors: Yang Zhang, Teoh Tze Tzun, Lim Wei Hern, Tiviatis Sim, Kenji Kawaguchi

    Abstract: Recent advancements in diffusion models have notably improved the perceptual quality of generated images in text-to-image synthesis tasks. However, diffusion models often struggle to produce images that accurately reflect the intended semantics of the associated text prompts. We examine cross-attention layers in diffusion models and observe a propensity for these layers to disproportionately focus… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  14. arXiv:2403.01251  [pdf, other

    cs.CL

    Accelerating Greedy Coordinate Gradient via Probe Sampling

    Authors: Yiran Zhao, Wenyue Zheng, Tianle Cai, Xuan Long Do, Kenji Kawaguchi, Anirudh Goyal, Michael Shieh

    Abstract: Safety of Large Language Models (LLMs) has become a critical issue given their rapid progresses. Greedy Coordinate Gradient (GCG) is shown to be effective in constructing adversarial prompts to break the aligned LLMs, but optimization of GCG is time-consuming. To reduce the time cost of GCG and enable more comprehensive studies of LLM safety, in this work, we study a new algorithm called… ▽ More

    Submitted 27 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

  15. arXiv:2402.18913  [pdf, other

    cs.CL cs.AI

    AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

    Authors: Yiran Zhao, Wenxuan Zhang, Huiming Wang, Kenji Kawaguchi, Lidong Bing

    Abstract: As an effective alternative to the direct fine-tuning on target tasks in specific languages, cross-lingual transfer addresses the challenges of limited training data by decoupling ''task ability'' and ''language ability'' by fine-tuning on the target task in the source language and another selected task in the target language, respectively. However, they fail to fully separate the task ability fro… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  16. arXiv:2402.18815  [pdf, other

    cs.CL cs.AI

    How do Large Language Models Handle Multilingualism?

    Authors: Yiran Zhao, Wenxuan Zhang, Guizhen Chen, Kenji Kawaguchi, Lidong Bing

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities across diverse languages. This study explores how LLMs handle multilingualism. Based on observed language ratio shifts among layers and the relationships between network structures and certain capabilities, we hypothesize the LLM's multilingual workflow ($\texttt{MWork}$): LLMs initially understand the query, converting multili… ▽ More

    Submitted 24 May, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  17. arXiv:2402.16305  [pdf, other

    cs.LG cs.AI

    Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion

    Authors: Xuantong Liu, Tianyang Hu, Wenjia Wang, Kenji Kawaguchi, Yuan Yao

    Abstract: As a dominant force in text-to-image generation tasks, Diffusion Probabilistic Models (DPMs) face a critical challenge in controllability, struggling to adhere strictly to complex, multi-faceted instructions. In this work, we aim to address this alignment challenge for conditional generation tasks. First, we provide an alternative view of state-of-the-art DPMs as a way of inverting advanced Vision… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  18. arXiv:2402.15170  [pdf, other

    cs.LG cs.AI

    The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling

    Authors: Jiajun Ma, Shuchen Xue, Tianyang Hu, Wenjia Wang, Zhaoqiang Liu, Zhenguo Li, Zhi-Ming Ma, Kenji Kawaguchi

    Abstract: With the incorporation of the UNet architecture, diffusion probabilistic models have become a dominant force in image generation tasks. One key design in UNet is the skip connections between the encoder and decoder blocks. Although skip connections have been shown to improve training stability and model performance, we reveal that such shortcuts can be a limiting factor for the complexity of the t… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  19. arXiv:2402.13368  [pdf, other

    cs.LG cs.CV

    Unsupervised Concept Discovery Mitigates Spurious Correlations

    Authors: Md Rifat Arefin, Yan Zhang, Aristide Baratin, Francesco Locatello, Irina Rish, Dianbo Liu, Kenji Kawaguchi

    Abstract: Models prone to spurious correlations in training data often produce brittle predictions and introduce unintended biases. Addressing this challenge typically involves methods relying on prior knowledge and group annotation to remove spurious correlations, which may not be readily available in many applications. In this paper, we establish a novel connection between unsupervised object-centric lear… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  20. arXiv:2402.07465  [pdf, other

    cs.LG cs.AI math.DS math.NA stat.ML

    Score-Based Physics-Informed Neural Networks for High-Dimensional Fokker-Planck Equations

    Authors: Zheyuan Hu, Zhongqiang Zhang, George Em Karniadakis, Kenji Kawaguchi

    Abstract: The Fokker-Planck (FP) equation is a foundational PDE in stochastic processes. However, curse of dimensionality (CoD) poses challenge when dealing with high-dimensional FP PDEs. Although Monte Carlo and vanilla Physics-Informed Neural Networks (PINNs) have shown the potential to tackle CoD, both methods exhibit numerical errors in high dimensions when dealing with the probability density function… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: 22 pages

    MSC Class: 14J60

  21. arXiv:2401.13923  [pdf, other

    cs.LG cs.IR q-bio.BM

    Towards 3D Molecule-Text Interpretation in Language Models

    Authors: Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, Qi Tian

    Abstract: Language Models (LMs) have greatly influenced diverse domains. However, their inherent limitation in comprehending 3D molecular structures has considerably constrained their potential in the biomolecular domain. To bridge this gap, we focus on 3D molecule-text interpretation, and propose 3D-MoLM: 3D-Molecular Language Modeling. Specifically, 3D-MoLM enables an LM to interpret and analyze 3D molecu… ▽ More

    Submitted 17 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  22. arXiv:2401.09067  [pdf, other

    cs.LG cs.AI cs.CV

    Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding

    Authors: Depeng Li, Tianqi Wang, Junwei Chen, Qining Ren, Kenji Kawaguchi, Zhigang Zeng

    Abstract: Deep neural networks are susceptible to catastrophic forgetting when trained on sequential tasks. Various continual learning (CL) methods often rely on exemplar buffers or/and network expansion for balancing model stability and plasticity, which, however, compromises their practical value due to privacy and memory concerns. Instead, this paper considers a strict yet realistic setting, where the tr… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted to AAAI 2024

  23. arXiv:2401.04136  [pdf, other

    cs.CR cs.AI

    The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline

    Authors: Haonan Wang, Qianli Shen, Yao Tong, Yang Zhang, Kenji Kawaguchi

    Abstract: The commercialization of text-to-image diffusion models (DMs) brings forth potential copyright concerns. Despite numerous attempts to protect DMs from copyright issues, the vulnerabilities of these solutions are underexplored. In this study, we formalized the Copyright Infringement Attack on generative AI models and proposed a backdoor attack method, SilentBadDiffusion, to induce copyright infring… ▽ More

    Submitted 26 May, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

    Comments: Accepted for presentation at ICML 2024

  24. arXiv:2401.02644  [pdf, other

    cs.LG cs.AI

    Simple Hierarchical Planning with Diffusion

    Authors: Chang Chen, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sung** Ahn

    Abstract: Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets. However, they often face computational challenges and can falter in generalization, especially in capturing temporal abstractions for long-horizon tasks. To overcome this, we introduce the Hierarchical Diffuser, a simple, fast, yet surprisingly effective planning method combining the advantages… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  25. arXiv:2401.01623  [pdf, other

    cs.AI cs.CL

    Can AI Be as Creative as Humans?

    Authors: Haonan Wang, James Zou, Michael Mozer, Anirudh Goyal, Alex Lamb, Linjun Zhang, Weijie J Su, Zhun Deng, Michael Qizhe Xie, Hannah Brown, Kenji Kawaguchi

    Abstract: Creativity serves as a cornerstone for societal progress and innovation. With the rise of advanced generative AI models capable of tasks once reserved for human creativity, the study of AI's creative potential becomes imperative for its responsible development and application. In this paper, we prove in theory that AI can be as creative as humans under the condition that it can properly fit the da… ▽ More

    Submitted 25 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: The paper examines AI's creativity, introducing Relative and Statistical Creativity for theoretical and practical analysis, along with practical training guidelines. Project Page: ai-relative-creativity.github.io

  26. arXiv:2312.14499  [pdf, other

    cs.LG cs.AI math.DS math.NA stat.ML

    Hutchinson Trace Estimation for High-Dimensional and High-Order Physics-Informed Neural Networks

    Authors: Zheyuan Hu, Zekun Shi, George Em Karniadakis, Kenji Kawaguchi

    Abstract: Physics-Informed Neural Networks (PINNs) have proven effective in solving partial differential equations (PDEs), especially when some data are available by seamlessly blending data and physics. However, extending PINNs to high-dimensional and even high-order PDEs encounters significant challenges due to the computational cost associated with automatic differentiation in the residual loss. Herein,… ▽ More

    Submitted 3 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Published in Computer Methods in Applied Mechanics and Engineering

    MSC Class: 14J60

    Journal ref: Computer Methods in Applied Mechanics and Engineering, Volume 424, 1 May 2024, 116883

  27. arXiv:2312.02614  [pdf, other

    cs.LG cs.CL

    Prompt Optimization via Adversarial In-Context Learning

    Authors: Xuan Long Do, Yiran Zhao, Hannah Brown, Yuxi Xie, James Xu Zhao, Nancy F. Chen, Kenji Kawaguchi, Michael Shieh, Junxian He

    Abstract: We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier. As in traditional adversarial learning, adv-ICL is implemented as a two-player game between the generator and discriminator, where the generator tries to generate realistic enough outp… ▽ More

    Submitted 22 June, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: ACL 2024

  28. arXiv:2312.00462  [pdf, other

    cs.CV

    Learning Unorthogonalized Matrices for Rotation Estimation

    Authors: Kerui Gu, Zhihao Li, Shiyong Liu, Jianzhuang Liu, Songcen Xu, Youliang Yan, Michael Bi Mi, Kenji Kawaguchi, Angela Yao

    Abstract: Estimating 3D rotations is a common procedure for 3D computer vision. The accuracy depends heavily on the rotation representation. One form of representation -- rotation matrices -- is popular due to its continuity, especially for pose estimation tasks. The learning process usually incorporates orthogonalization to ensure orthonormal matrices. Our work reveals, through gradient analysis, that comm… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  29. arXiv:2312.00057  [pdf, other

    cs.CR cs.AI cs.CV cs.MM

    VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models

    Authors: Xiang Li, Qianli Shen, Kenji Kawaguchi

    Abstract: The booming use of text-to-image generative models has raised concerns about their high risk of producing copyright-infringing content. While probabilistic copyright protection methods provide a probabilistic guarantee against such infringement, in this paper, we introduce Virtually Assured Amplification Attack (VA3), a novel online attack framework that exposes the vulnerabilities of these protec… ▽ More

    Submitted 2 April, 2024; v1 submitted 29 November, 2023; originally announced December 2023.

    Comments: 18 pages, 9 figures. Accept to CVPR 2024

  30. arXiv:2311.15283  [pdf, other

    cs.LG cs.AI math.DS math.NA stat.ML

    Bias-Variance Trade-off in Physics-Informed Neural Networks with Randomized Smoothing for High-Dimensional PDEs

    Authors: Zheyuan Hu, Zhouhao Yang, Yezhen Wang, George Em Karniadakis, Kenji Kawaguchi

    Abstract: While physics-informed neural networks (PINNs) have been proven effective for low-dimensional partial differential equations (PDEs), the computational cost remains a hurdle in high-dimensional scenarios. This is particularly pronounced when computing high-order and high-dimensional derivatives in the physics-informed loss. Randomized Smoothing PINN (RS-PINN) introduces Gaussian noise for stochasti… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: 21 pages, 5 figures

    MSC Class: 14J60

  31. arXiv:2311.12803  [pdf, other

    cs.MM cs.AI cs.GR

    On Copyright Risks of Text-to-Image Diffusion Models

    Authors: Yang Zhang, Teoh Tze Tzun, Lim Wei Hern, Haonan Wang, Kenji Kawaguchi

    Abstract: Diffusion models excel in many generative modeling tasks, notably in creating images from text prompts, a task referred to as text-to-image (T2I) generation. Despite the ability to generate high-quality images, these models often replicate elements from their training data, leading to increasing copyright concerns in real applications in recent years. In response to this raising concern about copy… ▽ More

    Submitted 18 February, 2024; v1 submitted 14 September, 2023; originally announced November 2023.

    Comments: 16 pages including appendix

  32. arXiv:2311.08385  [pdf, other

    cs.CL

    ChOiRe: Characterizing and Predicting Human Opinions with Chain of Opinion Reasoning

    Authors: Xuan Long Do, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen

    Abstract: Aligning language models (LMs) with human opinion is challenging yet vital to enhance their grasp of human values, preferences, and beliefs. We present ChOiRe, a four-step framework to predict human opinion which differentially models the user explicit personae (i.e. demographic or ideological attributes) that are manually declared, and implicit personae inferred from user historical opinions. ChO… ▽ More

    Submitted 27 February, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 22 pages

  33. arXiv:2310.14753  [pdf, other

    cs.LG

    Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

    Authors: Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua

    Abstract: Masked graph modeling excels in the self-supervised representation learning of molecular graphs. Scrutinizing previous studies, we can reveal a common scheme consisting of three key components: (1) graph tokenizer, which breaks a molecular graph into smaller fragments (i.e., subgraphs) and converts them into tokens; (2) graph masking, which corrupts the graph with masks; (3) graph autoencoder, whi… ▽ More

    Submitted 14 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023. 10 pages

  34. arXiv:2310.12798  [pdf, other

    cs.CL cs.MM

    MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

    Authors: Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua

    Abstract: Language Models (LMs) have demonstrated impressive molecule understanding ability on various 1D text-related tasks. However, they inherently lack 2D graph perception - a critical ability of human professionals in comprehending molecules' topological structures. To bridge this gap, we propose MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter. MolCA enables an… ▽ More

    Submitted 18 January, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: EMNLP main conference. 9 pages

  35. arXiv:2310.06923  [pdf, other

    cs.AI cs.LG

    PICProp: Physics-Informed Confidence Propagation for Uncertainty Quantification

    Authors: Qianli Shen, Wai Hoh Tang, Zhun Deng, Apostolos Psaros, Kenji Kawaguchi

    Abstract: Standard approaches for uncertainty quantification in deep learning and physics-informed learning have persistent limitations. Indicatively, strong assumptions regarding the data likelihood are required, the performance highly depends on the selection of priors, and the posterior can be sampled only approximately, which leads to poor approximations because of the associated computational cost. Thi… ▽ More

    Submitted 20 October, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted at NeurIPS 2023. Code is available at https://github.com/ShenQianli/PICProp

  36. arXiv:2310.06514  [pdf, other

    cs.LG

    AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

    Authors: Yang Zhang, Yawei Li, Hannah Brown, Mina Rezaei, Bernd Bischl, Philip Torr, Ashkan Khakzar, Kenji Kawaguchi

    Abstract: Feature attribution explains neural network outputs by identifying relevant input features. The attribution has to be faithful, meaning that the attributed features must mirror the input features that influence the output. One recent trend to test faithfulness is to fit a model on designed data with known relevant features and then compare attributions with ground truth input features.This idea as… ▽ More

    Submitted 14 February, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Appear at NeurIPS 2023 Workshop XAIA

  37. arXiv:2310.06511  [pdf, other

    cs.LG

    Self-Supervised Dataset Distillation for Transfer Learning

    Authors: Dong Bok Lee, Seanie Lee, Joonho Ko, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang

    Abstract: Dataset distillation methods have achieved remarkable success in distilling a large dataset into a small set of representative samples. However, they are not designed to produce a distilled dataset that can be effectively used for facilitating self-supervised pre-training. To this end, we propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient… ▽ More

    Submitted 11 April, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

  38. arXiv:2310.00841  [pdf, other

    cs.LG

    Drug Discovery with Dynamic Goal-aware Fragments

    Authors: Seul Lee, Seanie Lee, Kenji Kawaguchi, Sung Ju Hwang

    Abstract: Fragment-based drug discovery is an effective strategy for discovering drug candidates in the vast chemical space, and has been widely employed in molecular generative models. However, many existing fragment extraction methods in such models do not take the target chemical properties into account or rely on heuristic rules. Additionally, the existing fragment-based generative models cannot update… ▽ More

    Submitted 30 May, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  39. arXiv:2308.08949  [pdf, other

    cs.LG cs.AI

    A Dual-Perspective Approach to Evaluating Feature Attribution Methods

    Authors: Yawei Li, Yang Zhang, Kenji Kawaguchi, Ashkan Khakzar, Bernd Bischl, Mina Rezaei

    Abstract: Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model's behavior (i.e., faithfulness). Wh… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: 16 pages, 14 figures

  40. arXiv:2307.12306  [pdf, other

    cs.LG cs.AI math.DS math.NA stat.ML

    Tackling the Curse of Dimensionality with Physics-Informed Neural Networks

    Authors: Zheyuan Hu, Khemraj Shukla, George Em Karniadakis, Kenji Kawaguchi

    Abstract: The curse-of-dimensionality taxes computational resources heavily with exponentially increasing computational cost as the dimension increases. This poses great challenges in solving high-dimensional PDEs, as Richard E. Bellman first pointed out over 60 years ago. While there has been some recent success in solving numerically partial differential equations (PDEs) in high dimensions, such computati… ▽ More

    Submitted 17 May, 2024; v1 submitted 23 July, 2023; originally announced July 2023.

    Comments: Accepted by Neural Networks. Code is available at https://github.com/zheyuanhu01/SDGD_PINN

    MSC Class: 14J60 ACM Class: F.2.2; I.2.7

    Journal ref: Neural Networks, Volume 176, 2024, 106369, ISSN 0893-6080

  41. arXiv:2306.10480  [pdf, other

    cs.LG cs.CV

    IF2Net: Innately Forgetting-Free Networks for Continual Learning

    Authors: Depeng Li, Tianqi Wang, Bingrong Xu, Kenji Kawaguchi, Zhigang Zeng, Ponnuthurai Nagaratnam Suganthan

    Abstract: Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge. Motivated by the characteristics of neural networks, in which information is stored in weights on connections, we investigated how to design an Innately Forgetting-Free Network (IF2Net) for continual learning context. This study proposed a straightforward yet effective learning paradigm… ▽ More

    Submitted 18 June, 2023; originally announced June 2023.

    Comments: 16 pages, 8 figures. Under review

  42. Multi-View Class Incremental Learning

    Authors: Depeng Li, Tianqi Wang, Junwei Chen, Kenji Kawaguchi, Cheng Lian, Zhigang Zeng

    Abstract: Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance. To make MVL methods more practical in an open-ended environment, this paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream… ▽ More

    Submitted 13 October, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted to Information Fusion

    Journal ref: Information Fusion, 2023, 102, 102021

  43. arXiv:2306.06991  [pdf, other

    cs.CV cs.AI cs.LG

    Fast Diffusion Model

    Authors: Zike Wu, Pan Zhou, Kenji Kawaguchi, Hanwang Zhang

    Abstract: Diffusion models (DMs) have been adopted across diverse fields with its remarkable abilities in capturing intricate data distributions. In this paper, we propose a Fast Diffusion Model (FDM) to significantly speed up DMs from a stochastic optimization perspective for both faster training and sampling. We first find that the diffusion process of DMs accords with the stochastic optimization process… ▽ More

    Submitted 4 October, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

  44. arXiv:2305.18887  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.IT

    How Does Information Bottleneck Help Deep Learning?

    Authors: Kenji Kawaguchi, Zhun Deng, Xu Ji, Jiaoyang Huang

    Abstract: Numerous deep learning algorithms have been inspired by and understood via the notion of information bottleneck, where unnecessary information is (often implicitly) minimized while task-relevant information is maximized. However, a rigorous argument for justifying why it is desirable to control information bottlenecks has been elusive. In this paper, we provide the first rigorous learning theory f… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted at ICML 2023. Code is available at https://github.com/xu-ji/information-bottleneck

  45. arXiv:2305.18395  [pdf, other

    cs.CL cs.AI cs.LG

    Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

    Authors: Minki Kang, Seanie Lee, **heon Baek, Kenji Kawaguchi, Sung Ju Hwang

    Abstract: Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small Language Models (LMs) by fine-tu… ▽ More

    Submitted 30 October, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  46. arXiv:2305.14333  [pdf, other

    cs.CL cs.AI

    Automatic Model Selection with Large Language Models for Reasoning

    Authors: James Xu Zhao, Yuxi Xie, Kenji Kawaguchi, Junxian He, Michael Qizhe Xie

    Abstract: Chain-of-Thought (CoT) and Program-Aided Language Models (PAL) represent two distinct reasoning methods, each with its own strengths. CoT employs natural language, offering flexibility and interpretability, while PAL utilizes programming language, yielding more structured and rigorous logic. We introduce a model selection method to combine the best of both worlds by employing a large language mode… ▽ More

    Submitted 23 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 Findings

  47. arXiv:2305.05208  [pdf, other

    cs.CV

    Boosting Visual-Language Models by Exploiting Hard Samples

    Authors: Haonan Wang, Minbin Huang, Runhui Huang, Lanqing Hong, Hang Xu, Tianyang Hu, Xiaodan Liang, Zhenguo Li, Hong Cheng, Kenji Kawaguchi

    Abstract: Contrastive Language-Image Pre-training (CLIP) has become the standard for learning cross-modal representations between images and text. Efforts to improve its capabilities typically demand the collection of additional data and retraining with new loss functions. While effective, the added requirements limit their practical use due to the increased resource and time investments needed. In this wor… ▽ More

    Submitted 10 March, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: The code is publicly available at https://github.com/haonan3/HELIP

  48. arXiv:2305.00633  [pdf, other

    cs.CL cs.AI cs.LG

    Self-Evaluation Guided Beam Search for Reasoning

    Authors: Yuxi Xie, Kenji Kawaguchi, Yiran Zhao, Xu Zhao, Min-Yen Kan, Junxian He, Qizhe Xie

    Abstract: Breaking down a problem into intermediate steps has demonstrated impressive performance in Large Language Model (LLM) reasoning. However, the growth of the reasoning chain introduces uncertainty and error accumulation, making it challenging to elicit accurate final results. To tackle this challenge of uncertainty in multi-step reasoning, we introduce a stepwise self-evaluation mechanism to guide a… ▽ More

    Submitted 25 October, 2023; v1 submitted 30 April, 2023; originally announced May 2023.

    Comments: NeurIPS 2023. 10 pages, 7 figures, 4 tables (33 pages, 14 figures, 15 tables including references and appendices)

  49. arXiv:2304.03935  [pdf, other

    cs.LG

    Last-Layer Fairness Fine-tuning is Simple and Effective for Neural Networks

    Authors: Yuzhen Mao, Zhun Deng, Huaxiu Yao, Ting Ye, Kenji Kawaguchi, James Zou

    Abstract: As machine learning has been deployed ubiquitously across applications in modern data science, algorithmic fairness has become a great concern. Among them, imposing fairness constraints during learning, i.e. in-processing fair training, has been a popular type of training method because they don't require accessing sensitive attributes during test time in contrast to post-processing methods. While… ▽ More

    Submitted 14 July, 2023; v1 submitted 8 April, 2023; originally announced April 2023.

    Comments: Published at the ICML 2023 Workshop on Spurious Correlations, Invariance, and Stability

  50. arXiv:2303.00633  [pdf, other

    cs.IT cs.AI

    An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization

    Authors: Ravid Shwartz-Ziv, Randall Balestriero, Kenji Kawaguchi, Tim G. J. Rudner, Yann LeCun

    Abstract: Variance-Invariance-Covariance Regularization (VICReg) is a self-supervised learning (SSL) method that has shown promising results on a variety of tasks. However, the fundamental mechanisms underlying VICReg remain unexplored. In this paper, we present an information-theoretic perspective on the VICReg objective. We begin by deriving information-theoretic quantities for deterministic networks as a… ▽ More

    Submitted 1 May, 2024; v1 submitted 1 March, 2023; originally announced March 2023.