Skip to main content

Showing 1–50 of 280 results for author: Hsieh, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00256  [pdf, other

    cs.AI cs.CL cs.LG stat.ML

    One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

    Authors: Ruochen Wang, Sohyun An, Minhao Cheng, Tianyi Zhou, Sung Ju Hwang, Cho-Jui Hsieh

    Abstract: Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction.… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: ICML 2024. code available at https://github.com/ruocwang/mixture-of-prompts

    MSC Class: 68T01

    Journal ref: Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 2024

  2. arXiv:2406.17806  [pdf, other

    cs.CL cs.AI cs.CR cs.CV cs.LG

    MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?

    Authors: Xirui Li, Hengguang Zhou, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Cho-Jui Hsieh

    Abstract: Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  3. arXiv:2406.17224  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.SC

    Large Language Models are Interpretable Learners

    Authors: Ruochen Wang, Si Si, Felix Yu, Dorothea Wiesmann, Cho-Jui Hsieh, Inderjit Dhillon

    Abstract: The trade-off between expressiveness and interpretability remains a core challenge when building human-centric predictive models for classification and decision-making. While symbolic rules offer interpretability, they often lack expressiveness, whereas neural networks excel in performance but are known for being black boxes. In this paper, we show a combination of Large Language Models (LLMs) and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Preliminary Version, Code at [this url](https://github.com/ruocwang/llm-symbolic-program)

    MSC Class: 68T05

  4. arXiv:2406.16008  [pdf, other

    cs.CL cs.AI cs.LG

    Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

    Authors: Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister

    Abstract: Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: ACL Findings 2024

  5. arXiv:2406.12910  [pdf

    cs.LG cs.AI cs.NE physics.chem-ph q-bio.BM

    Human-level molecular optimization driven by mol-gene evolution

    Authors: Jiebin Fang, Churu Mao, Yuchen Zhu, Xiaoming Chen, Chang-Yu Hsieh, Zhongjun Ma

    Abstract: De novo molecule generation allows the search for more drug-like hits across a vast chemical space. However, lead optimization is still required, and the process of optimizing molecular structures faces the challenge of balancing structural novelty with pharmacological properties. This study introduces the Deep Genetic Molecular Modification Algorithm (DGMM), which brings structure modification to… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  6. arXiv:2406.11794  [pdf, other

    cs.LG cs.CL

    DataComp-LM: In search of the next generation of training sets for language models

    Authors: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner , et al. (34 additional authors not shown)

    Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with dat… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Project page: https://www.datacomp.ai/dclm/

  7. arXiv:2406.06609  [pdf, other

    cs.LG cs.AI cs.CV

    Ameliorate Spurious Correlations in Dataset Condensation

    Authors: Justin Cui, Ruochen Wang, Yuanhao Xiong, Cho-Jui Hsieh

    Abstract: Dataset Condensation has emerged as a technique for compressing large datasets into smaller synthetic counterparts, facilitating downstream training tasks. In this paper, we study the impact of bias inside the original dataset on the performance of dataset condensation. With a comprehensive empirical evaluation on canonical datasets with color, corruption and background biases, we found that color… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: ICML

  8. arXiv:2406.05184  [pdf, other

    cs.CV

    The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

    Authors: Scott Geng, Cheng-Yu Hsieh, Vivek Ramanujan, Matthew Wallingford, Chun-Liang Li, Pang Wei Koh, Ranjay Krishna

    Abstract: Generative text-to-image models enable us to synthesize unlimited amounts of images in a controllable manner, spurring many recent efforts to train vision models with synthetic data. However, every synthetic image ultimately originates from the upstream data used to train the generator. What additional value does the intermediate generator provide over directly training on relevant parts of the up… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Correspondence to [email protected]. RK and PWK equally advised the project

  9. arXiv:2406.03720  [pdf, other

    cs.CV cs.MM

    JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits

    Authors: Minzhou Pan, Yi Zeng, Xue Lin, Ning Yu, Cho-Jui Hsieh, Peter Henderson, Ruoxi Jia

    Abstract: In this study, we investigate the vulnerability of image watermarks to diffusion-model-based image editing, a challenge exacerbated by the computational cost of accessing gradient information and the closed-source nature of many diffusion models. To address this issue, we introduce JIGMARK. This first-of-its-kind watermarking technique enhances robustness through contrastive learning with pairs of… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  10. arXiv:2406.02965  [pdf, other

    cs.CV

    Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

    Authors: Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Boqing Gong, Cho-Jui Hsieh

    Abstract: The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.%, demonstrating significant practical efficacy. Despite the widespread use of negative prompts, their intrinsic mechanisms remain largely unexplored. This paper presents the first comprehensive study to uncover how and when negative… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  11. arXiv:2406.01970  [pdf, other

    cs.CV cs.AI

    The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise

    Authors: Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Boqing Gong, Cho-Jui Hsieh, Minhao Cheng

    Abstract: Diffusion models have achieved remarkable success in text-to-image generation tasks; however, the role of initial noise has been rarely explored. In this study, we identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images. Notably, these patches are ``universal'' and can be generalized across various positio… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  12. arXiv:2405.21063  [pdf, other

    cs.LG cs.AI

    Neural Network Verification with Branch-and-Bound for General Nonlinearities

    Authors: Zhouxing Shi, Qirui **, Zico Kolter, Suman Jana, Cho-Jui Hsieh, Huan Zhang

    Abstract: Branch-and-bound (BaB) is among the most effective methods for neural network (NN) verification. However, existing works on BaB have mostly focused on NNs with piecewise linear activations, especially ReLU networks. In this paper, we develop a general framework, named GenBaB, to conduct BaB for general nonlinearities in general computational graphs based on linear bound propagation. To decide whic… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Preprint

  13. arXiv:2405.20947  [pdf, other

    cs.CL cs.AI

    OR-Bench: An Over-Refusal Benchmark for Large Language Models

    Authors: Justin Cui, Wei-Lin Chiang, Ion Stoica, Cho-Jui Hsieh

    Abstract: Large Language Models (LLMs) require careful safety alignment to prevent malicious outputs. While significant research focuses on mitigating harmful content generation, the enhanced safety often come with the side effect of over-refusal, where LLMs may reject innocuous prompts and become less helpful. Although the issue of over-refusal has been empirically observed, a systematic measurement is cha… ▽ More

    Submitted 20 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: version 2, 10 pages main, 22 pages total

  14. arXiv:2405.04545  [pdf, other

    cs.LG cs.IR

    Learning label-label correlations in Extreme Multi-label Classification via Label Features

    Authors: Siddhant Kharbanda, Devaansh Gupta, Erik Schultheis, Atmadeep Banerjee, Cho-Jui Hsieh, Rohit Babbar

    Abstract: Extreme Multi-label Text Classification (XMC) involves learning a classifier that can assign an input with a subset of most relevant labels from millions of label choices. Recent works in this domain have increasingly focused on a symmetric problem setting where both input instances and label features are short-text in nature. Short-text XMC with label features has found numerous applications in a… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  15. arXiv:2405.03714  [pdf, other

    cs.LG cs.AI

    UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification

    Authors: Siddhant Kharbanda, Devaansh Gupta, Gururaj K, Pankaj Malhotra, Cho-Jui Hsieh, Rohit Babbar

    Abstract: Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space, given an input query and labels with textual features. Models developed for this problem have conventionally used modular approach with (i) a Dual Encoder (DE) to embed the queries and label texts, (ii) a One-vs-All classifier to rerank the shortlisted labels mined through… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  16. arXiv:2404.19230  [pdf

    q-bio.BM cs.AI

    Deep Lead Optimization: Leveraging Generative AI for Structural Modification

    Authors: Odin Zhang, Haitao Lin, Hui Zhang, Huifeng Zhao, Yufei Huang, Yuansheng Huang, Dejun Jiang, Chang-yu Hsieh, Peichen Pan, Tingjun Hou

    Abstract: The idea of using deep-learning-based molecular generation to accelerate discovery of drug candidates has attracted extraordinary attention, and many deep generative models have been developed for automated drug design, termed molecular generation. In general, molecular generation encompasses two main strategies: de novo design, which generates novel molecular structures from scratch, and lead opt… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  17. arXiv:2404.17709  [pdf, other

    stat.ML cs.LG

    Low-rank Matrix Bandits with Heavy-tailed Rewards

    Authors: Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

    Abstract: In stochastic low-rank matrix bandit, the expected reward of an arm is equal to the inner product between its feature matrix and some unknown $d_1$ by $d_2$ low-rank parameter matrix $Θ^*$ with rank $r \ll d_1\wedge d_2$. While all prior studies assume the payoffs are mixed with sub-Gaussian noises, in this work we loosen this strict assumption and consider the new problem of \underline{low}-rank… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: The 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)

  18. arXiv:2404.10573  [pdf, other

    cs.AI cs.CE q-bio.BM

    AAVDiff: Experimental Validation of Enhanced Viability and Diversity in Recombinant Adeno-Associated Virus (AAV) Capsids through Diffusion Generation

    Authors: Lijun Liu, Jiali Yang, Jianfei Song, Xinglin Yang, Lele Niu, Zeqi Cai, Hui Shi, Tingjun Hou, Chang-yu Hsieh, Weiran Shen, Yafeng Deng

    Abstract: Recombinant adeno-associated virus (rAAV) vectors have revolutionized gene therapy, but their broad tropism and suboptimal transduction efficiency limit their clinical applications. To overcome these limitations, researchers have focused on designing and screening capsid libraries to identify improved vectors. However, the large sequence space and limited resources present challenges in identifyin… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  19. arXiv:2404.07956  [pdf, other

    cs.LG cs.AI cs.RO eess.SY math.OC

    Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation

    Authors: Lujie Yang, Hongkai Dai, Zhouxing Shi, Cho-Jui Hsieh, Russ Tedrake, Huan Zhang

    Abstract: Learning-based neural network (NN) control policies have shown impressive empirical performance in a wide range of tasks in robotics and control. However, formal (Lyapunov) stability guarantees over the region-of-attraction (ROA) for NN controllers with nonlinear dynamical systems are challenging to obtain, and most existing approaches rely on expensive solvers such as sums-of-squares (SOS), mixed… ▽ More

    Submitted 4 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Paper accepted by ICML 2024

  20. arXiv:2404.06654  [pdf, other

    cs.CL

    RULER: What's the Real Context Size of Your Long-Context Language Models?

    Authors: Cheng-** Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, Boris Ginsburg

    Abstract: The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted to evaluate long-context language models (LMs). However, this simple retrieval-based test is indicative of only a superficial form of long-context understanding. To provide a more comprehensive evaluation of long-con… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  21. arXiv:2404.00270  [pdf, other

    cs.DC cs.DS

    Engineering A Workload-balanced Push-Relabel Algorithm for Massive Graphs on GPUs

    Authors: Chou-Ying Hsieh, Po-Chieh Lin, Sy-Yen Kuo

    Abstract: The push-relabel algorithm is an efficient algorithm that solves the maximum flow/ minimum cut problems of its affinity to parallelization. As the size of graphs grows exponentially, researchers have used Graphics Processing Units (GPUs) to accelerate the computation of the push-relabel algorithm further. However, prior works need to handle the significant memory consumption to represent a massive… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  22. arXiv:2404.00014  [pdf

    physics.chem-ph cs.AI q-bio.BM

    Deep Geometry Handling and Fragment-wise Molecular 3D Graph Generation

    Authors: Odin Zhang, Yufei Huang, Shichen Cheng, Mengyao Yu, Xujun Zhang, Haitao Lin, Yundian Zeng, Mingyang Wang, Zhenxing Wu, Huifeng Zhao, Zaixi Zhang, Chenqing Hua, Yu Kang, Sunliang Cui, Peichen Pan, Chang-Yu Hsieh, Tingjun Hou

    Abstract: Most earlier 3D structure-based molecular generation approaches follow an atom-wise paradigm, incrementally adding atoms to a partially built molecular fragment within protein pockets. These methods, while effective in designing tightly bound ligands, often overlook other essential properties such as synthesizability. The fragment-wise generation paradigm offers a promising solution. However, a co… ▽ More

    Submitted 15 March, 2024; originally announced April 2024.

  23. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  24. arXiv:2402.16914  [pdf, other

    cs.CR cs.AI cs.CL

    DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers

    Authors: Xirui Li, Ruochen Wang, Minhao Cheng, Tianyi Zhou, Cho-Jui Hsieh

    Abstract: The safety alignment of Large Language Models (LLMs) is vulnerable to both manual and automated jailbreak attacks, which adversarially trigger LLMs to output harmful content. However, current methods for jailbreaking LLMs, which nest entire harmful prompts, are not effective at concealing malicious intent and can be easily identified and rejected by well-aligned LLMs. This paper discovers that dec… ▽ More

    Submitted 1 March, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

  25. arXiv:2402.16459  [pdf, other

    cs.CL cs.AI

    Defending LLMs against Jailbreaking Attacks via Backtranslation

    Authors: Yihan Wang, Zhouxing Shi, Andrew Bai, Cho-Jui Hsieh

    Abstract: Although many large language models (LLMs) have been trained to refuse harmful requests, they are still vulnerable to jailbreaking attacks which rewrite the original prompt to conceal its harmful intent. In this paper, we propose a new method for defending LLMs against jailbreaking attacks by ``backtranslation''. Specifically, given an initial response generated by the target LLM from an input pro… ▽ More

    Submitted 6 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  26. arXiv:2402.15751  [pdf, other

    cs.LG cs.AI cs.CL

    Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

    Authors: Yong Liu, Zirui Zhu, Chaoyu Gong, Minhao Cheng, Cho-Jui Hsieh, Yang You

    Abstract: While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO) optimizers, recently proposed to address this issue, only require forward passes during training, making them more memory-friendly. However, the quality of gradient est… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  27. arXiv:2402.15666  [pdf, other

    cs.LG cs.AI cs.IR

    Universal Model in Online Customer Service

    Authors: Shu-Ting Pi, Cheng-** Hsieh, Qun Liu, Yuying Zhu

    Abstract: Building machine learning models can be a time-consuming process that often takes several months to implement in typical business scenarios. To ensure consistent model performance and account for variations in data distribution, regular retraining is necessary. This paper introduces a solution for improving online customer service in e-commerce by presenting a universal model for predict-ing label… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Journal ref: Companion Proceedings of the ACM Web Conference 2023

  28. arXiv:2402.12741  [pdf, other

    cs.CV

    MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object Diffusion

    Authors: Sen Li, Ruochen Wang, Cho-Jui Hsieh, Minhao Cheng, Tianyi Zhou

    Abstract: Existing text-to-image models still struggle to generate images of multiple objects, especially in handling their spatial positions, relative sizes, overlap**, and attribute bindings. To efficiently address these challenges, we develop a training-free Multimodal-LLM agent (MuLan), as a human painter, that can progressively generate multi-object with intricate planning and feedback control. MuLan… ▽ More

    Submitted 24 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Added the application to human-agent interaction; added discussion with concurrent work

  29. arXiv:2402.10516  [pdf, other

    q-bio.BM cs.AI cs.LG

    Generative AI for Controllable Protein Sequence Design: A Survey

    Authors: Yiheng Zhu, Zitai Kong, Jialu Wu, Weize Liu, Yuqiang Han, Mingze Yin, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou

    Abstract: The design of novel protein sequences with targeted functionalities underpins a central theme in protein engineering, impacting diverse fields such as drug discovery and enzymatic engineering. However, navigating this vast combinatorial search space remains a severe challenge due to time and financial constraints. This scenario is rapidly evolving as the transformative advancements in AI, particul… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 9 pages

  30. arXiv:2402.08096  [pdf, other

    cs.LG

    Which Pretrain Samples to Rehearse when Finetuning Pretrained Models?

    Authors: Andrew Bai, Chih-Kuan Yeh, Cho-Jui Hsieh, Ankur Taly

    Abstract: Fine-tuning pretrained foundational models on specific tasks is now the de facto approach for text and vision tasks. A known pitfall of this approach is the forgetting of pretraining knowledge that happens during finetuning. Rehearsing samples randomly from the pretrain dataset is a common approach to alleviate such forgetting. However, we find that random mixing unintentionally includes samples w… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: 17 pages, 13 figures

  31. arXiv:2402.01057  [pdf, other

    cs.LG

    Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning

    Authors: Chia-Cheng Chiang, Li-Cheng Lan, Wei-Fang Sun, Chien Feng, Cho-Jui Hsieh, Chun-Yi Lee

    Abstract: In this paper, we focus on single-demonstration imitation learning (IL), a practical approach for real-world applications where acquiring multiple expert demonstrations is costly or infeasible and the ground truth reward function is not available. In contrast to typical IL settings with multiple demonstrations, single-demonstration IL involves an agent having access to only one expert trajectory.… ▽ More

    Submitted 30 May, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: Published at ICML 2024. Code: https://github.com/stanl1y/tdil

  32. arXiv:2401.09031  [pdf, other

    cs.LG

    Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation

    Authors: Tong Xie, Haoyu Li, Andrew Bai, Cho-Jui Hsieh

    Abstract: Data attribution methods trace model behavior back to its training dataset, offering an effective approach to better understand ''black-box'' neural networks. While prior research has established quantifiable links between model output and training data in diverse settings, interpreting diffusion model outputs in relation to training samples remains underexplored. In particular, diffusion models o… ▽ More

    Submitted 21 January, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  33. arXiv:2401.07298  [pdf, other

    stat.ML cs.LG

    Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

    Authors: Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

    Abstract: In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $Θ^*$ with rank $r \ll \{d_1, d_2\}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized lo… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: Revision of the paper accepted by NeurIPS 2022

  34. arXiv:2401.05039  [pdf, other

    cs.DC

    Accelerating Maximal Biclique Enumeration on GPUs

    Authors: Chou-Ying Hsieh, Chia-Ming Chang, Po-Hsiu Cheng, Sy-Yen Kuo

    Abstract: Maximal Biclique Enumeration (MBE) holds critical importance in graph theory with applications extending across fields such as bioinformatics, social networks, and recommendation systems. However, its computational complexity presents barriers for efficiently scaling to large graphs. To address these challenges, we introduce cuMBE, a GPU-optimized parallel algorithm for MBE. Utilizing a unique dat… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  35. arXiv:2312.14934  [pdf

    cs.NI cs.AI cs.CY cs.SD eess.AS

    aoip.ai: An Open-Source P2P SDK

    Authors: Joseph Konan, Shikhar Agnihotri, Chia-Chun Hsieh

    Abstract: This white paper introduces aoip.ai, a groundbreaking open-source SDK incorporating peer-to-peer technology and advanced AI integration to transform VoIP and IoT applications. It addresses key market challenges by enhancing data security, elevating communication quality, and providing greater flexibility for developers and users. Developed in collaboration with Carnegie Mellon University, aoip.ai… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  36. arXiv:2312.14380  [pdf, other

    cs.LG

    Federated Learning with Projected Trajectory Regularization

    Authors: Tie** Chen, Yuanpu Cao, Yujia Wang, Cho-Jui Hsieh, **ghui Chen

    Abstract: Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients, which leads to deteriorated model training performances. Prior works in this line of research mainly focus on utilizing last-step global model parameters/gradients or… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 9 pages

  37. arXiv:2312.12433  [pdf, other

    cs.CV cs.AI cs.LG

    TAO-Amodal: A Benchmark for Tracking Any Object Amodally

    Authors: Cheng-Yen Hsieh, Kaihua Chen, Achal Dave, Tarasha Khurana, Deva Ramanan

    Abstract: Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants. Its significance extends to applications like autonomous driving, where a clear understanding of heavily occluded objects is essential. However, modern detection and tracking algorithms often overlook this critical capability, perhaps due to the prevalence of \… ▽ More

    Submitted 2 April, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: Project Page: https://tao-amodal.github.io

  38. PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval Models

    Authors: Wei-Cheng Chang, Jyun-Yu Jiang, Jiong Zhang, Mutasem Al-Darabsah, Choon Hui Teo, Cho-Jui Hsieh, Hsiang-Fu Yu, S. V. N. Vishwanathan

    Abstract: Embedding-based Retrieval Models (ERMs) have emerged as a promising framework for large-scale text retrieval problems due to powerful large language models. Nevertheless, fine-tuning ERMs to reach state-of-the-art results can be expensive due to the extreme scale of data as well as the complexity of multi-stages pipelines (e.g., pre-training, fine-tuning, distillation). In this work, we propose th… ▽ More

    Submitted 5 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Accept by WSDM 2024

  39. arXiv:2311.16588  [pdf

    cs.CL

    Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation

    Authors: Rui Yang, Qingcheng Zeng, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha D Dave, Tiarnan D. L. Keenan, Emily Y Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li

    Abstract: This study introduces Ascle, a pioneering natural language processing (NLP) toolkit designed for medical text generation. Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle evaluates and provides interfaces for the latest pre-trained language models, encompassing f… ▽ More

    Submitted 9 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: 5 figures, 4 tables

  40. arXiv:2311.10117  [pdf, other

    cs.AI cs.LG

    Automatic Engineering of Long Prompts

    Authors: Cho-Jui Hsieh, Si Si, Felix X. Yu, Inderjit S. Dhillon

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in solving complex open-domain tasks, guided by comprehensive instructions and demonstrations provided in the form of prompts. However, these prompts can be lengthy, often comprising hundreds of lines and thousands of tokens, and their design often requires considerable human effort. Recent research has explored automatic promp… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  41. arXiv:2311.10085  [pdf, other

    cs.LG cs.CL math.OC

    A Computationally Efficient Sparsified Online Newton Method

    Authors: Fnu Devvrit, Sai Surya Duvvuri, Rohan Anil, Vineet Gupta, Cho-Jui Hsieh, Inderjit Dhillon

    Abstract: Second-order methods hold significant promise for enhancing the convergence of deep neural network training; however, their large memory and computational demands have limited their practicality. Thus there is a need for scalable second-order methods that can efficiently train large models. In this paper, we introduce the Sparsified Online Newton (SONew) method, a memory-efficient second-order alg… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 30 pages. First two authors contributed equally. Accepted at NeurIPS 2023

  42. arXiv:2311.09668  [pdf, other

    cs.CL cs.CR cs.LG

    Improving the Generation Quality of Watermarked Large Language Models via Word Importance Scoring

    Authors: Yuhang Li, Yihan Wang, Zhouxing Shi, Cho-Jui Hsieh

    Abstract: The strong general capabilities of Large Language Models (LLMs) bring potential ethical risks if they are unrestrictedly accessible to malicious users. Token-level watermarking inserts watermarks in the generated texts by altering the token probability distributions with a private random number generator seeded by its prefix tokens. However, this watermarking algorithm alters the logits during gen… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: Work in progress

  43. arXiv:2311.02798  [pdf, other

    cs.LG physics.chem-ph q-bio.QM

    From molecules to scaffolds to functional groups: building context-dependent molecular representation via multi-channel learning

    Authors: Yue Wan, Jialu Wu, Tingjun Hou, Chang-Yu Hsieh, Xiaowei Jia

    Abstract: Reliable molecular property prediction is essential for various scientific endeavors and industrial applications, such as drug discovery. However, the data scarcity, combined with the highly non-linear causal relationships between physicochemical and biological properties and conventional molecular featurization schemes, complicates the development of robust molecular machine learning models. Self… ▽ More

    Submitted 30 June, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

  44. arXiv:2310.11866  [pdf, ps, other

    cs.LG

    Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function

    Authors: Liu Liu, Xuanqing Liu, Cho-Jui Hsieh, Dacheng Tao

    Abstract: Trust-region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appealing theoretical properties for non-convex optimization by concurrently computing function value, gradient, and Hessian matrix to obtain the next search direction and the adjusted parameters. Although stochastic approximations help largely reduce the computational cost, it is challenging to theoreti… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: text overlap with arXiv:1809.09853

  45. arXiv:2310.09468  [pdf, other

    quant-ph cs.LG

    Randomized Benchmarking of Local Zeroth-Order Optimizers for Variational Quantum Systems

    Authors: Lucas Tecot, Cho-Jui Hsieh

    Abstract: In the field of quantum information, classical optimizers play an important role. From experimentalists optimizing their physical devices to theorists exploring variational quantum algorithms, many aspects of quantum information require the use of a classical optimizer. For this reason, there are many papers that benchmark the effectiveness of different optimizers for specific quantum optimization… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  46. arXiv:2310.07269  [pdf, other

    cs.LG math.OC stat.ML

    Why Does Sharpness-Aware Minimization Generalize Better Than SGD?

    Authors: Zixiang Chen, Junkai Zhang, Yiwen Kou, Xiangning Chen, Cho-Jui Hsieh, Quanquan Gu

    Abstract: The challenge of overfitting, in which the model memorizes the training data and fails to generalize to test data, has become increasingly significant in the training of large neural networks. To tackle this challenge, Sharpness-Aware Minimization (SAM) has emerged as a promising training method, which can improve the generalization of neural networks even in the presence of label noise. However,… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 52 pages, 4 figures, 2 tables. In NeurIPS 2023

  47. arXiv:2310.05175  [pdf, other

    cs.LG

    Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

    Authors: Lu Yin, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Gen Li, Ajay Jaiswal, Mykola Pechenizkiy, Yi Liang, Michael Bendersky, Zhangyang Wang, Shiwei Liu

    Abstract: Large Language Models (LLMs), renowned for their remarkable performance across diverse domains, present a challenge when it comes to practical deployment due to their colossal model size. In response to this challenge, efforts have been directed toward the application of traditional network pruning techniques to LLMs, uncovering a massive number of parameters that can be pruned in one-shot without… ▽ More

    Submitted 6 May, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

  48. arXiv:2310.05007  [pdf, other

    cs.CL

    MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering

    Authors: Xiusi Chen, Jyun-Yu Jiang, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, Wei Wang

    Abstract: Recent advances in few-shot question answering (QA) mostly rely on the power of pre-trained large language models (LLMs) and fine-tuning in specific settings. Although the pre-training stage has already equipped LLMs with powerful reasoning capabilities, LLMs still need to be fine-tuned to adapt to specific domains to achieve the best results. In this paper, we propose to select the most informati… ▽ More

    Submitted 28 May, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: ACL 2024 main conference

  49. arXiv:2309.11968  [pdf, other

    quant-ph cs.IT

    Quantum complementarity: A novel resource for unambiguous exclusion and encryption

    Authors: Chung-Yun Hsieh, Roope Uola, Paul Skrzypczyk

    Abstract: Complementarity is a phenomenon explaining several core features of quantum theory, such as the well-known uncertainty principle. Roughly speaking, two objects are said to be complementary if being certain about one of them necessarily forbids useful knowledge about the other. Two quantum measurements that do not commute form an example of complementary measurements, and this phenomenon can also b… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: 6+5 pages; 4 figures

  50. arXiv:2308.00675  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

    Authors: Cheng-Yu Hsieh, Si-An Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister

    Abstract: Today, large language models (LLMs) are taught to use new tools by providing a few demonstrations of the tool's usage. Unfortunately, demonstrations are hard to acquire, and can result in undesirable biased usage if the wrong demonstration is chosen. Even in the rare scenario that demonstrations are readily available, there is no principled selection protocol to determine how many and which ones t… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.