Skip to main content

Showing 1–50 of 158 results for author: Xing, E P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.20098  [pdf, other

    cs.CV cs.AI cs.CL

    Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

    Authors: Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, **hong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen

    Abstract: Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Website at https://mbzuai-llm.github.io/webpage2code/

  2. arXiv:2406.09455  [pdf, other

    cs.CV cs.AI cs.CL

    Pandora: Towards General World Model with Natural Language Actions and Video States

    Authors: Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

    Abstract: World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Website: https://world-model.maitrix.org/

  3. arXiv:2406.00519  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Discrete Concepts in Latent Hierarchical Models

    Authors: Ling**g Kong, Guangyi Chen, Biwei Huang, Eric P. Xing, Yuejie Chi, Kun Zhang

    Abstract: Learning concepts from natural high-dimensional data (e.g., images) holds potential in building human-aligned and interpretable machine learning models. Despite its encouraging prospect, formalization and theoretical insights into this crucial task are still lacking. In this work, we formalize concepts as discrete latent causal variables that are related via a hierarchical causal model that encode… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  4. arXiv:2404.02852  [pdf, other

    cs.LG

    Toward Inference-optimal Mixture-of-Expert Large Language Models

    Authors: Longfei Yun, Yonghao Zhuang, Yao Fu, Eric P Xing, Hao Zhang

    Abstract: Mixture-of-Expert (MoE) based large language models (LLMs), such as the recent Mixtral and DeepSeek-MoE, have shown great promise in scaling model size without suffering from the quadratic growth of training cost of dense transformers. Like dense models, training MoEs requires answering the same question: given a training budget, what is the optimal allocation on the model size and number of token… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 15 pages, 8 figures

  5. arXiv:2402.19009  [pdf, other

    cs.LG cs.AI

    Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding

    Authors: Guangyi Liu, Yu Wang, Zeyu Feng, Qiyu Wu, Li** Tang, Yuan Gao, Zhen Li, Shuguang Cui, Julian McAuley, Zichao Yang, Eric P. Xing, Zhiting Hu

    Abstract: The vast applications of deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations -- across various data types, such as discrete text/protein sequences and continuous images. Existing model families, like variational autoencoders (VAEs), generative adversarial networks (GANs), autoregressive models, and… ▽ More

    Submitted 5 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: ICML 2024 camera-ready. Code is available at https://github.com/guangyliu/EDDPM

  6. arXiv:2402.16840  [pdf, other

    cs.CL

    MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

    Authors: Omkar Thawakar, Ashmal Vayani, Salman Khan, Hisham Cholakal, Rao M. Anwer, Michael Felsberg, Tim Baldwin, Eric P. Xing, Fahad Shahbaz Khan

    Abstract: "Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. However, LLMs do not suit well for scenarios that require on-device processing, energy efficiency, low memory footprint, and response efficiency. These requisites are crucial for privacy, security, and sustainable deployment. This paper explores the "less is more" paradigm by addressing the chall… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Code available at : https://github.com/mbzuai-oryx/MobiLlama

  7. arXiv:2312.06550  [pdf, other

    cs.CL cs.AI cs.LG

    LLM360: Towards Fully Transparent Open-Source LLMs

    Authors: Zhengzhong Liu, Aurick Qiao, Willie Neiswanger, Hongyi Wang, Bowen Tan, Tianhua Tao, Junbo Li, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Yonghao Zhuang, Guowei He, Haonan Li, Fajri Koto, Li** Tang, Nikhil Ranjan, Zhiqiang Shen, Xuguang Ren, Roberto Iriondo, Cun Mu, Zhiting Hu, Mark Schulze , et al. (3 additional authors not shown)

    Abstract: The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final model weights or inference code, and technical reports increasingly limit their scope to high-level design choices and surface statistics. These choices hinder prog… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  8. arXiv:2311.12023  [pdf, other

    cs.CL cs.LG

    LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning

    Authors: Han Guo, Philip Greengard, Eric P. Xing, Yoon Kim

    Abstract: We propose a simple approach for memory-efficient adaptation of pretrained language models. Our approach uses an iterative algorithm to decompose each pretrained matrix into a high-precision low-rank component and a memory-efficient quantized component. During finetuning, the quantized component remains fixed and only the low-rank component is updated. We present an integer linear programming form… ▽ More

    Submitted 30 June, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

  9. arXiv:2310.16427  [pdf, other

    cs.CL

    PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

    Authors: Xinyuan Wang, Chenxi Li, Zhen Wang, Fan Bai, Haotian Luo, Jiayou Zhang, Nebojsa Jojic, Eric P. Xing, Zhiting Hu

    Abstract: Highly effective, task-specific prompts are often heavily engineered by experts to integrate detailed instructions and domain insights based on a deep understanding of both instincts of large language models (LLMs) and the intricacies of the target task. However, automating the generation of such expert-level prompts remains elusive. Existing prompt optimization methods tend to overlook the depth… ▽ More

    Submitted 7 December, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 34 pages, 10 figures

  10. arXiv:2310.11340  [pdf, other

    stat.ML cs.LG

    Contextualized Machine Learning

    Authors: Benjamin Lengerich, Caleb N. Ellington, Andrea Rubbi, Manolis Kellis, Eric P. Xing

    Abstract: We examine Contextualized Machine Learning (ML), a paradigm for learning heterogeneous and context-dependent effects. Contextualized ML estimates heterogeneous functions by applying deep learning to the meta-relationship between contextual information and context-specific parametric models. This is a form of varying-coefficient modeling that unifies existing frameworks including cluster analysis a… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  11. arXiv:2310.07918  [pdf, other

    cs.LG cs.AI stat.ML

    Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning

    Authors: Jannik Deuschel, Caleb N. Ellington, Yingtao Luo, Benjamin J. Lengerich, Pascal Friederich, Eric P. Xing

    Abstract: Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models force a tradeoff between accuracy and interpretability, limiting data-driven interpretations of human decision-making processes. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, w… ▽ More

    Submitted 7 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  12. arXiv:2310.03294  [pdf, other

    cs.LG cs.AI cs.DC

    DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training

    Authors: Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Xuezhe Ma, Ion Stoica, Joseph E. Gonzalez, Hao Zhang

    Abstract: FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU. In this paper, we introduce DISTFLASHATTN, a distributed memory-efficient attention mechanism optimized for long-context LLMs training. We propose three key techniques: token-level workload balancing, overlap** key-value communicatio… ▽ More

    Submitted 31 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  13. arXiv:2310.03163  [pdf, other

    cs.LG

    FedNAR: Federated Optimization with Normalized Annealing Regularization

    Authors: Junbo Li, Ang Li, Chong Tian, Qirong Ho, Eric P. Xing, Hongyi Wang

    Abstract: Weight decay is a standard technique to improve generalization performance in modern deep neural network optimization, and is also widely adopted in federated learning (FL) to prevent overfitting in local clients. In this paper, we first explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. While preventing overfi… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: Thirty-seventh Conference on Neural Information Processing Systems

    Journal ref: Thirty-seventh Conference on Neural Information Processing Systems, 2023

  14. arXiv:2309.11998  [pdf, other

    cs.CL cs.AI

    LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

    Authors: Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

    Abstract: Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications. In this paper, we introduce LMSYS-Chat-1M, a large-scale dataset containing one million real-world conversations with 25 state-of-the-art LLMs. This dataset is collected from 210K unique IP addresses in the wild on our Vicuna demo and… ▽ More

    Submitted 10 March, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

  15. arXiv:2306.05685  [pdf, other

    cs.CL cs.AI

    Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

    Authors: Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

    Abstract: Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement… ▽ More

    Submitted 23 December, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  16. arXiv:2306.04898  [pdf, other

    cs.LG cs.CV

    Understanding Masked Autoencoders via Hierarchical Latent Variable Models

    Authors: Ling**g Kong, Martin Q. Ma, Guangyi Chen, Eric P. Xing, Yuejie Chi, Louis-Philippe Morency, Kun Zhang

    Abstract: Masked autoencoder (MAE), a simple and effective self-supervised learning framework based on the reconstruction of masked image regions, has recently achieved prominent success in a variety of vision tasks. Despite the emergence of intriguing empirical observations on MAE, a theoretically principled understanding is still lacking. In this work, we formally characterize and justify existing empiric… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: CVPR 2023 Highlight

  17. arXiv:2305.02538  [pdf, other

    cs.LG

    Cuttlefish: Low-Rank Model Training without All the Tuning

    Authors: Hongyi Wang, Saurabh Agarwal, Pongsakorn U-chupala, Yoshiki Tanaka, Eric P. Xing, Dimitris Papailiopoulos

    Abstract: Recent research has shown that training low-rank neural networks can effectively reduce the total number of trainable parameters without sacrificing predictive accuracy, resulting in end-to-end speedups. However, low-rank model training necessitates adjusting several additional factorization hyperparameters, such as the rank of the factorization at each layer. In this paper, we tackle this challen… ▽ More

    Submitted 5 May, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted for presentation at MLSys 2023

  18. arXiv:2302.04228  [pdf, other

    cs.LG

    Federated Learning as Variational Inference: A Scalable Expectation Propagation Approach

    Authors: Han Guo, Philip Greengard, Hongyi Wang, Andrew Gelman, Yoon Kim, Eric P. Xing

    Abstract: The canonical formulation of federated learning treats it as a distributed optimization problem where the model parameters are optimized against a global loss function that decomposes across client loss functions. A recent alternative formulation instead treats federated learning as a distributed inference problem, where the goal is to infer a global posterior from partitioned client data (Al-Shed… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

  19. arXiv:2301.02654  [pdf, other

    cs.LG

    Does compressing activations help model parallel training?

    Authors: Song Bian, Dacheng Li, Hongyi Wang, Eric P. Xing, Shivaram Venkataraman

    Abstract: Large-scale Transformer models are known for their exceptional performance in a range of tasks, but training them can be difficult due to the requirement for communication-intensive model parallelism. One way to improve training speed is to compress the message size in communication. Previous approaches have primarily focused on compressing gradients in a data parallelism setting, but compression… ▽ More

    Submitted 6 January, 2023; originally announced January 2023.

    Comments: 16 pages, 5 figures

  20. arXiv:2212.04875  [pdf, other

    cs.CV cs.AI

    Expeditious Saliency-guided Mix-up through Random Gradient Thresholding

    Authors: Minh-Long Luu, Zeyi Huang, Eric P. Xing, Yong Jae Lee, Haohan Wang

    Abstract: Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks. Over the years, the research community expands mix-up methods into two directions, with extensive efforts to improve saliency-guided procedures but minimal focus on the arbitrary path, leaving the randomization domain unexplored. In this paper, inspired by the superior qualities… ▽ More

    Submitted 10 August, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: Accepted Long paper at 2nd Practical-DL Workshop at AAAI 2023

  21. arXiv:2211.05322  [pdf, other

    cs.LG cs.DC

    On Optimizing the Communication of Model Parallelism

    Authors: Yonghao Zhuang, Hexu Zhao, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

    Abstract: We study a novel and important communication pattern in large-scale model-parallel deep learning (DL), which we call cross-mesh resharding. This pattern emerges when the two paradigms of model parallelism - intra-operator and inter-operator parallelism - are combined to support large models on large clusters. In cross-mesh resharding, a sharded tensor needs to be sent from a source device mesh to… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  22. arXiv:2211.01452  [pdf, other

    cs.LG cs.CR

    MPCFormer: fast, performant and private Transformer inference with MPC

    Authors: Dacheng Li, Rulin Shao, Hongyi Wang, Han Guo, Eric P. Xing, Hao Zhang

    Abstract: Enabling private inference is crucial for many cloud inference services that are based on Transformer models. However, existing private inference solutions can increase the inference latency by more than 60x or significantly compromise the inference quality. In this paper, we design the framework MPCFORMER as a practical solution, using Secure Multi-Party Computation (MPC) and Knowledge Distillati… ▽ More

    Submitted 16 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  23. arXiv:2210.04325  [pdf, other

    cs.CL cs.AI cs.LG

    ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models

    Authors: Jiannan Xiang, Zhengzhong Liu, Yucheng Zhou, Eric P. Xing, Zhiting Hu

    Abstract: Data-to-text generation is challenging due to the great variety of the input data in terms of domains (e.g., finance vs sports) or schemata (e.g., diverse predicates). Recent end-to-end neural methods thus require substantial training examples to learn to disambiguate and describe the data. Yet, real-world data-to-text problems often suffer from various data-scarce issues: one may have access to o… ▽ More

    Submitted 22 October, 2022; v1 submitted 9 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  24. arXiv:2208.00219  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation

    Authors: Gongjie Zhang, Zhipeng Luo, Kaiwen Cui, Shijian Lu, Eric P. Xing

    Abstract: Few-shot object detection has been extensively investigated by incorporating meta-learning into region-based detection frameworks. Despite its success, the said paradigm is still constrained by several factors, such as (i) low-quality region proposals for novel classes and (ii) negligence of the inter-class correlation among different classes. Such limitations hinder the generalization of base-cla… ▽ More

    Submitted 30 July, 2022; originally announced August 2022.

    Comments: Accepted by T-PAMI (IEEE Transactions on Pattern Analysis and Machine Intelligence). Codes: https://github.com/ZhangGongjie/Meta-DETR

  25. arXiv:2207.14172  [pdf, other

    cs.CV

    Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion

    Authors: Gongjie Zhang, Zhipeng Luo, Jiaxing Huang, Shijian Lu, Eric P. Xing

    Abstract: The recently proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection. However, DETR suffers from slow training convergence, which hinders its applicability to various detection tasks. We observe that DETR's slow convergence is largely attributed to the difficulty in matching object queries to relevant regions due to the unaligned semantics between obj… ▽ More

    Submitted 6 February, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

  26. arXiv:2207.08944  [pdf, other

    cs.CV cs.LG

    Robustar: Interactive Toolbox Supporting Precise Data Annotation for Robust Vision Learning

    Authors: Chonghan Chen, Haohan Wang, Leyang Hu, Yuhao Zhang, Shuguang Lyu, **gcheng Wu, Xinnuo Li, Lin**g Sun, Eric P. Xing

    Abstract: We introduce the initial release of our software Robustar, which aims to improve the robustness of vision classification machine learning models through a data-driven perspective. Building upon the recent understanding that the lack of machine learning model's robustness is the tendency of the model's learning of spurious features, we aim to solve this problem from its root at the data perspective… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: This paper introduces the first release of our software. The paper is expected to be updated as we continue to develop the software

  27. arXiv:2207.08943  [pdf, ps, other

    cs.CL cs.LG

    MRCLens: an MRC Dataset Bias Detection Toolkit

    Authors: Yifan Zhong, Haohan Wang, Eric P. Xing

    Abstract: Many recent neural models have shown remarkable empirical results in Machine Reading Comprehension, but evidence suggests sometimes the models take advantage of dataset biases to predict and fail to generalize on out-of-sample data. While many other approaches have been proposed to address this issue from the computation perspective such as new architectures or training procedures, we believe a me… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: dataperf workshop at IMCL

  28. arXiv:2206.14268  [pdf, other

    cs.CL

    BertNet: Harvesting Knowledge Graphs with Arbitrary Relations from Pretrained Language Models

    Authors: Shibo Hao, Bowen Tan, Kaiwen Tang, Bin Ni, Xiyan Shao, Hengzhe Zhang, Eric P. Xing, Zhiting Hu

    Abstract: It is crucial to automatically construct knowledge graphs (KGs) of diverse new relations to support knowledge discovery and broad applications. Previous KG construction methods, based on either crowdsourcing or text mining, are often limited to a small predefined set of relations due to manual cost or restrictions in text corpus. Recent research proposed to use pretrained language models (LMs) as… ▽ More

    Submitted 2 June, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: ACL 2023 (Findings); Code available at https://github.com/tanyuqian/knowledge-harvest-from-lms

  29. arXiv:2206.01909  [pdf, ps, other

    cs.LG

    Toward Learning Robust and Invariant Representations with Alignment Regularization and Data Augmentation

    Authors: Haohan Wang, Zeyi Huang, Xindi Wu, Eric P. Xing

    Abstract: Data augmentation has been proven to be an effective technique for develo** machine learning models that are robust to known classes of distributional shifts (e.g., rotations of images), and alignment regularization is a technique often used together with data augmentation to further help the model learn representations invariant to the shifts used to augment the data. In this paper, motivated b… ▽ More

    Submitted 4 June, 2022; originally announced June 2022.

    Comments: to appear at KDD 2022, the software package is at https://github.com/jyanln/AlignReg. arXiv admin note: text overlap with arXiv:2011.13052

  30. arXiv:2205.12548  [pdf, other

    cs.CL cs.LG

    RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning

    Authors: Mingkai Deng, Jianyu Wang, Cheng-** Hsieh, Yihan Wang, Han Guo, Tianmin Shu, Meng Song, Eric P. Xing, Zhiting Hu

    Abstract: Prompting has shown impressive success in enabling large pretrained language models (LMs) to perform diverse NLP tasks, especially when only few downstream data are available. Automatically finding the optimal prompt for each task, however, is challenging. Most existing work resorts to tuning soft prompt (e.g., embeddings) which falls short of interpretability, reusability across LMs, and applicab… ▽ More

    Submitted 22 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022 Camera Ready. Code available at https://github.com/mingkaid/rl-prompt

  31. arXiv:2204.04384  [pdf, other

    cs.LG cs.CV

    The Two Dimensions of Worst-case Training and the Integrated Effect for Out-of-domain Generalization

    Authors: Zeyi Huang, Haohan Wang, Dong Huang, Yong Jae Lee, Eric P. Xing

    Abstract: Training with an emphasis on "hard-to-learn" components of the data has been proven as an effective method to improve the generalization of machine learning models, especially in the settings where robustness (e.g., generalization across distributions) is valued. Existing literature discussing this "hard-to-learn" concept are mainly expanded either along the dimension of the samples or the dimensi… ▽ More

    Submitted 9 April, 2022; originally announced April 2022.

    Comments: to appear at CVPR2022

  32. arXiv:2202.01336  [pdf, other

    cs.LG

    Exploring Transformer Backbones for Heterogeneous Treatment Effect Estimation

    Authors: Yi-Fan Zhang, Hanlin Zhang, Zachary C. Lipton, Li Erran Li, Eric P. Xing

    Abstract: Previous works on Treatment Effect Estimation (TEE) are not in widespread use because they are predominantly theoretical, where strong parametric assumptions are made but untractable for practical application. Recent work uses multilayer perceptron (MLP) for modeling casual relationships, however, MLPs lag far behind recent advances in ML methodology, which limits their applicability and generaliz… ▽ More

    Submitted 17 October, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

  33. arXiv:2201.12023  [pdf, other

    cs.LG cs.DC cs.PL

    Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

    Authors: Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yan** Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica

    Abstract: Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations. They do not suffice to scale out complex DL models… ▽ More

    Submitted 28 June, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: OSDI 2022

  34. arXiv:2111.13839  [pdf, other

    cs.LG cs.CV

    Towards Principled Disentanglement for Domain Generalization

    Authors: Hanlin Zhang, Yi-Fan Zhang, Weiyang Liu, Adrian Weller, Bernhard Schölkopf, Eric P. Xing

    Abstract: A fundamental challenge for machine learning models is generalizing to out-of-distribution (OOD) data, in part due to spurious correlations. To tackle this challenge, we first formalize the OOD generalization problem as constrained optimization, called Disentanglement-constrained Domain Generalization (DDG). We relax this non-trivial constrained optimization problem to a tractable form with finite… ▽ More

    Submitted 19 October, 2022; v1 submitted 27 November, 2021; originally announced November 2021.

    Comments: CVPR 2022 Oral

  35. arXiv:2111.01104  [pdf, other

    stat.ML cs.AI cs.LG

    NOTMAD: Estimating Bayesian Networks with Sample-Specific Structures and Parameters

    Authors: Ben Lengerich, Caleb Ellington, Bryon Aragam, Eric P. Xing, Manolis Kellis

    Abstract: Context-specific Bayesian networks (i.e. directed acyclic graphs, DAGs) identify context-dependent relationships between variables, but the non-convexity induced by the acyclicity requirement makes it difficult to share information between context-specific estimators (e.g. with graph generator functions). For this reason, existing methods for inferring context-specific Bayesian networks have favor… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  36. arXiv:2110.05231  [pdf, other

    q-bio.GN cs.AI cs.LG

    Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types

    Authors: Shentong Mo, Xi Fu, Chenyang Hong, Yizhen Chen, Yuxuan Zheng, Xiangru Tang, Zhiqiang Shen, Eric P Xing, Yanyan Lan

    Abstract: In the genome biology research, regulatory genome modeling is an important topic for many regulatory downstream tasks, such as promoter classification, transaction factor binding sites prediction. The core problem is to model how regulatory elements interact with each other and its variability across different cell types. However, current deep learning methods often focus on modeling genome sequen… ▽ More

    Submitted 3 November, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

  37. arXiv:2110.02784  [pdf, other

    cs.MA cs.CR cs.LG

    Cooperative Multi-Agent Actor-Critic for Privacy-Preserving Load Scheduling in a Residential Microgrid

    Authors: Zhaoming Qin, Nanqing Dong, Eric P. Xing, Junwei Cao

    Abstract: As a scalable data-driven approach, multi-agent reinforcement learning (MARL) has made remarkable advances in solving the cooperative residential load scheduling problems. However, the common centralized training strategy of MARL algorithms raises privacy risks for involved households. In this work, we propose a privacy-preserving multi-agent actor-critic framework where the decentralized actors a… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

  38. arXiv:2109.06379  [pdf, other

    cs.CL cs.LG

    Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation

    Authors: Mingkai Deng, Bowen Tan, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

    Abstract: Natural language generation (NLG) spans a broad range of tasks, each of which serves for specific objectives and desires different properties of generated text. The complexity makes automatic evaluation of NLG particularly challenging. Previous work has typically focused on a single task and developed individual evaluation metrics based on specific intuitions. In this paper, we propose a unifying… ▽ More

    Submitted 21 January, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021, Code available at https://github.com/tanyuqian/ctc-gen-eval

  39. arXiv:2109.04707  [pdf, other

    cs.CL cs.LG

    Knowledge-Aware Meta-learning for Low-Resource Text Classification

    Authors: Huaxiu Yao, Yingxin Wu, Maruan Al-Shedivat, Eric P. Xing

    Abstract: Meta-learning has achieved great success in leveraging the historical learned knowledge to facilitate the learning process of the new task. However, merely learning the knowledge from the historical tasks, adopted by current meta-learning algorithms, may not generalize well to testing tasks when they are not well-supported by training tasks. This paper studies a low-resource text classification pr… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: Accepted by EMNLP 2021

  40. arXiv:2108.07783  [pdf, other

    cs.LG

    Toward a `Standard Model' of Machine Learning

    Authors: Zhiting Hu, Eric P. Xing

    Abstract: Machine learning (ML) is about computational methods that enable machines to learn concepts from experience. In handling a wide variety of experience ranging from data instances, knowledge, constraints, to rewards, adversaries, and lifelong interaction in an ever-growing spectrum of tasks, contemporary ML/AI (artificial intelligence) research has resulted in a multitude of learning paradigms and m… ▽ More

    Submitted 10 January, 2023; v1 submitted 17 August, 2021; originally announced August 2021.

    Comments: 46 pages; Online version on Harvard Data Science Review: https://hdsr.mitpress.mit.edu/pub/zkib7xth/release/2

  41. arXiv:2106.09179  [pdf, other

    cs.LG cs.AI stat.ML

    Amortized Auto-Tuning: Cost-Efficient Bayesian Transfer Optimization for Hyperparameter Recommendation

    Authors: Yuxin Xiao, Eric P. Xing, Willie Neiswanger

    Abstract: With the surge in the number of hyperparameters and training times of modern machine learning models, hyperparameter tuning is becoming increasingly expensive. However, after assessing 40 tuning methods systematically, we find that each faces certain limitations. In particular, methods that speed up tuning via knowledge transfer typically require the final performance of hyperparameters and do not… ▽ More

    Submitted 7 April, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

  42. arXiv:2106.07704  [pdf, other

    cs.CL cs.LG

    Efficient (Soft) Q-Learning for Text Generation with Limited Good Data

    Authors: Han Guo, Bowen Tan, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

    Abstract: Maximum likelihood estimation (MLE) is the predominant algorithm for training text generation models. This paradigm relies on direct supervision examples, which is not applicable to many emerging applications, such as generating adversarial attacks or generating prompts to control language models. Reinforcement learning (RL) on the other hand offers a more flexible solution by allowing users to pl… ▽ More

    Submitted 22 October, 2022; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: Code available at https://github.com/HanGuo97/soft-Q-learning-for-text-generation

  43. arXiv:2105.14517  [pdf, other

    cs.AI

    GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

    Authors: Jiaqi Chen, Jianheng Tang, **ghui Qin, Xiaodan Liang, Lingbo Liu, Eric P. Xing, Liang Lin

    Abstract: Automatic math problem solving has recently attracted increasing attention as a long-standing AI benchmark. In this paper, we focus on solving geometric problems, which requires a comprehensive understanding of textual descriptions, visual diagrams, and theorem knowledge. However, the existing methods were highly dependent on handcraft rules and were merely evaluated on small-scale datasets. There… ▽ More

    Submitted 10 January, 2022; v1 submitted 30 May, 2021; originally announced May 2021.

    Comments: Accepted to Findings of ACL 2021

  44. arXiv:2103.01834  [pdf, other

    cs.CL cs.AI

    A Data-Centric Framework for Composable NLP Workflows

    Authors: Zhengzhong Liu, Guanxiong Ding, Avinash Bukkittu, Mansi Gupta, Pengzhi Gao, Atif Ahmed, Shikun Zhang, Xin Gao, Swapnil Singhavi, Linwei Li, Wei Wei, Zecong Hu, Haoran Shi, Haoying Zhang, Xiaodan Liang, Teruko Mitamura, Eric P. Xing, Zhiting Hu

    Abstract: Empirical natural language processing (NLP) systems in application domains (e.g., healthcare, finance, education) involve interoperation among multiple components, ranging from data ingestion, human annotation, to text retrieval, analysis, generation, and visualization. We establish a unified open-source framework to support fast development of such sophisticated NLP workflows in a composable mann… ▽ More

    Submitted 1 September, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: 8 pages, 4 figures, EMNLP 2020

  45. Technology Readiness Levels for Machine Learning Systems

    Authors: Alexander Lavin, Ciarán M. Gilligan-Lee, Alessya Visnjic, Siddha Ganju, Dava Newman, Atılım Güneş Baydin, Sujoy Ganguly, Danny Lange, Amit Sharma, Stephan Zheng, Eric P. Xing, Adam Gibson, James Parr, Chris Mattmann, Yarin Gal

    Abstract: The development and deployment of machine learning (ML) systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. The lack of diligence can lead to technical debt, scope creep and misaligned objectives, model misuse and failures, and expensive consequences. Engineering systems, on the other hand, follow well-defined processes and testing standards t… ▽ More

    Submitted 29 November, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

  46. arXiv:2012.09610  [pdf

    cs.LG

    Validate and Enable Machine Learning in Industrial AI

    Authors: Hongbo Zou, Guang**g Chen, Pengtao Xie, Sean Chen, Yongtian He, Hochih Huang, Zheng Nie, Hongbao Zhang, Tristan Bala, Kazi Tulip, Yuqi Wang, Shenlin Qin, Eric P. Xing

    Abstract: Industrial Artificial Intelligence (Industrial AI) is an emerging concept which refers to the application of artificial intelligence to industry. Industrial AI promises more efficient future industrial control systems. However, manufacturers and solution partners need to understand how to implement and integrate an AI model into the existing industrial control system. A well-trained machine learni… ▽ More

    Submitted 30 October, 2020; originally announced December 2020.

    Comments: 9 pages, 8 figures

  47. arXiv:2011.14164  [pdf, other

    cs.CV cs.LG eess.IV

    Towards Robust Partially Supervised Multi-Structure Medical Image Segmentation on Small-Scale Data

    Authors: Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Min Xu, Irina Voiculescu, Eric P. Xing

    Abstract: The data-driven nature of deep learning (DL) models for semantic segmentation requires a large number of pixel-level annotations. However, large-scale and fully labeled medical datasets are often unavailable for practical tasks. Recently, partially supervised methods have been proposed to utilize images with incomplete labels in the medical domain. To bridge the methodological gaps in partially su… ▽ More

    Submitted 26 October, 2021; v1 submitted 28 November, 2020; originally announced November 2020.

    Comments: Accepted by Applied Soft Computing

  48. arXiv:2011.13052  [pdf, ps, other

    cs.LG

    Squared $\ell_2$ Norm as Consistency Loss for Leveraging Augmented Data to Learn Robust and Invariant Representations

    Authors: Haohan Wang, Zeyi Huang, Xindi Wu, Eric P. Xing

    Abstract: Data augmentation is one of the most popular techniques for improving the robustness of neural networks. In addition to directly training the model with original samples and augmented samples, a torrent of methods regularizing the distance between embeddings/representations of the original samples and their augmented counterparts have been introduced. In this paper, we explore these various regula… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

    Comments: 12 pages and an additional 9 pages as appendix

  49. arXiv:2010.12609  [pdf, other

    cs.LG

    Iterative Graph Self-Distillation

    Authors: Hanlin Zhang, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan Liang, Eric P. Xing

    Abstract: Recently, there has been increasing interest in the challenge of how to discriminatively vectorize graphs. To address this, we propose a method called Iterative Graph Self-Distillation (IGSD) which learns graph-level representation in an unsupervised manner through instance discrimination using a self-supervised contrastive learning approach. IGSD involves a teacher-student distillation process th… ▽ More

    Submitted 3 January, 2023; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: The Workshop on Self-Supervised Learning for the Web

  50. arXiv:2010.09997  [pdf, other

    cs.CL cs.AI

    Word Shape Matters: Robust Machine Translation with Visual Embedding

    Authors: Haohan Wang, Peiyan Zhang, Eric P. Xing

    Abstract: Neural machine translation has achieved remarkable empirical performance over standard benchmark datasets, yet recent evidence suggests that the models can still fail easily dealing with substandard inputs such as misspelled words, To overcome this issue, we introduce a new encoding heuristic of the input symbols for character-level NLP models: it encodes the shape of each character through the im… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.