Skip to main content

Showing 1–50 of 58 results for author: Arik, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15708  [pdf, other

    cs.CL cs.AI cs.LG

    Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization

    Authors: Xingchen Wan, Ruoxi Sun, Hootan Nakhost, Sercan O. Arik

    Abstract: Large language models have demonstrated remarkable capabilities, but their performance is heavily reliant on effective prompt engineering. Automatic prompt optimization (APO) methods are designed to automate this and can be broadly categorized into those targeting instructions (instruction optimization, IO) vs. those targeting exemplars (exemplar selection, ES). Despite their shared objective, the… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2406.04153  [pdf, other

    cs.LG

    Learned Feature Importance Scores for Automated Feature Engineering

    Authors: Yihe Dong, Sercan Arik, Nathanael Yoder, Tomas Pfister

    Abstract: Feature engineering has demonstrated substantial utility for many machine learning workflows, such as in the small data regime or when distribution shifts are severe. Thus automating this capability can relieve much manual effort and improve model performance. Towards this, we propose AutoMAN, or Automated Mask-based Feature Engineering, an automated feature engineering framework that achieves hig… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2406.02818  [pdf, other

    cs.CL

    Chain of Agents: Large Language Models Collaborating on Long-Context Tasks

    Authors: Yusen Zhang, Ruoxi Sun, Yanfei Chen, Tomas Pfister, Rui Zhang, Sercan Ö. Arik

    Abstract: Addressing the challenge of effectively processing long contexts has become a critical issue for Large Language Models (LLMs). Two common strategies have emerged: 1) reducing the input length, such as retrieving relevant chunks by Retrieval-Augmented Generation (RAG), and 2) expanding the context window limit of LLMs. However, both strategies have drawbacks: input reduction has no guarantee of cov… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 19 pages, 6 figures

  4. arXiv:2406.00222  [pdf, other

    cs.CL cs.AI cs.LG

    Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training

    Authors: Maximillian Chen, Ruoxi Sun, Sercan Ö. Arık, Tomas Pfister

    Abstract: Large language models (LLMs) aligned through reinforcement learning from human feedback (RLHF) have quickly become one of the dominant paradigms for building intelligent conversational assistant agents. However, despite their strong performance across many benchmarks, LLM-based agents still lack conversational skills such as disambiguation: when generalized assistants are faced with ambiguity, the… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  5. arXiv:2405.18654  [pdf, other

    cs.CV

    Mitigating Object Hallucination via Data Augmented Contrastive Tuning

    Authors: Pritam Sarkar, Sayna Ebrahimi, Ali Etemad, Ahmad Beirami, Sercan Ö. Arık, Tomas Pfister

    Abstract: Despite their remarkable progress, Multimodal Large Language Models (MLLMs) tend to hallucinate factually inaccurate information. In this work, we address object hallucinations in MLLMs, where information is offered about an object that is not present in the model input. We introduce a contrastive tuning method that can be applied to a pretrained off-the-shelf MLLM for mitigating hallucinations wh… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  6. arXiv:2404.09491  [pdf, other

    cs.LG

    Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

    Authors: Sungwon Han, **sung Yoon, Sercan O Arik, Tomas Pfister

    Abstract: Large Language Models (LLMs), with their remarkable ability to tackle challenging and unseen reasoning problems, hold immense potential for tabular learning, that is vital for many real-world applications. In this paper, we propose a novel in-context learning framework, FeatLLM, which employs LLMs as feature engineers to produce an input data set that is optimally suited for tabular predictions. T… ▽ More

    Submitted 6 May, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted to ICML, 2024

  7. arXiv:2312.01279  [pdf, other

    cs.CL cs.AI cs.LG

    TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents

    Authors: James Enouen, Hootan Nakhost, Sayna Ebrahimi, Sercan O Arik, Yan Liu, Tomas Pfister

    Abstract: Large language models (LLMs) have attracted huge interest in practical applications given their increasingly accurate responses and coherent reasoning abilities. Given their nature as black-boxes using complex reasoning processes on their inputs, it is inevitable that the demand for scalable and faithful explanations for LLMs' generated content will continue to grow. There have been major developm… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  8. arXiv:2311.09533  [pdf, other

    cs.CL

    Effective Large Language Model Adaptation for Improved Grounding and Citation Generation

    Authors: Xi Ye, Ruoxi Sun, Sercan Ö. Arik, Tomas Pfister

    Abstract: Large language models (LLMs) have achieved remarkable advancements in natural language understanding and generation. However, one major issue towards their widespread deployment in the real world is that they can generate "hallucinated" answers that are not factual. Towards this end, this paper focuses on improving LLMs by grounding their responses in retrieved passages and by providing citations.… ▽ More

    Submitted 2 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  9. arXiv:2311.02883  [pdf, other

    cs.CL

    SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data

    Authors: Ruoxi Sun, Sercan Ö. Arik, Rajarishi Sinha, Hootan Nakhost, Hanjun Dai, Pengcheng Yin, Tomas Pfister

    Abstract: Text-to-SQL aims to automate the process of generating SQL queries on a database from natural language text. In this work, we propose "SQLPrompt", tailored to improve the few-shot prompting capabilities of Text-to-SQL for Large Language Models (LLMs). Our methods include innovative prompt design, execution-based consistency decoding strategy which selects the SQL with the most consistent execution… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  10. arXiv:2311.00886  [pdf, other

    cs.LG

    COSTAR: Improved Temporal Counterfactual Estimation with Self-Supervised Learning

    Authors: Chuizheng Meng, Yihe Dong, Sercan Ö. Arık, Yan Liu, Tomas Pfister

    Abstract: Estimation of temporal counterfactual outcomes from observed history is crucial for decision-making in many domains such as healthcare and e-commerce, particularly when randomized controlled trials (RCTs) suffer from high cost or impracticality. For real-world datasets, modeling time-dependent confounders is challenging due to complex dynamics, long-range dependencies and both past treatments and… ▽ More

    Submitted 12 February, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

  11. arXiv:2310.11689  [pdf, other

    cs.CL cs.LG

    Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs

    Authors: Jiefeng Chen, **sung Yoon, Sayna Ebrahimi, Sercan O Arik, Tomas Pfister, Somesh Jha

    Abstract: Large language models (LLMs) have recently shown great advances in a variety of tasks, including natural language understanding and generation. However, their use in high-stakes decision-making scenarios is still limited due to the potential for errors. Selective prediction is a technique that can be used to improve the reliability of the LLMs by allowing them to abstain from making predictions wh… ▽ More

    Submitted 11 November, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Paper published at Findings of the Association for Computational Linguistics: EMNLP, 2023

  12. arXiv:2310.08750  [pdf, other

    cs.LG

    Search-Adaptor: Embedding Customization for Information Retrieval

    Authors: **sung Yoon, Sercan O Arik, Yanfei Chen, Tomas Pfister

    Abstract: Embeddings extracted by pre-trained Large Language Models (LLMs) have significant potential to improve information retrieval and search. Beyond the zero-shot setup in which they are being conventionally used, being able to take advantage of the information from the relevant query-corpus paired data can further boost the LLM capabilities. In this paper, we propose a novel method, Search-Adaptor, fo… ▽ More

    Submitted 12 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

  13. arXiv:2310.04948  [pdf, other

    cs.LG cs.CL

    TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting

    Authors: Defu Cao, Furong Jia, Sercan O Arik, Tomas Pfister, Yixiang Zheng, Wen Ye, Yan Liu

    Abstract: The past decade has witnessed significant advances in time series modeling with deep learning. While achieving state-of-the-art results, the best-performing architectures vary highly across applications and domains. Meanwhile, for natural language processing, the Generative Pre-trained Transformer (GPT) has demonstrated impressive performance via training one general-purpose model across various t… ▽ More

    Submitted 2 April, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

    Comments: Accepted by ICLR 2024. Camera Ready Version

  14. arXiv:2308.13703  [pdf, other

    cs.LG

    PAITS: Pretraining and Augmentation for Irregularly-Sampled Time Series

    Authors: Nicasia Beebe-Wang, Sayna Ebrahimi, **sung Yoon, Sercan O. Arik, Tomas Pfister

    Abstract: Real-world time series data that commonly reflect sequential human behavior are often uniquely irregularly sampled and sparse, with highly nonuniform sampling over time and entities. Yet, commonly-used pretraining and augmentation methods for time series are not specifically designed for such scenarios. In this paper, we present PAITS (Pretraining and Augmentation for Irregularly-sampled Time Seri… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Comments: Code: \url{https://github.com/google-research/google-research/tree/master/irregular_timeseries_pretraining}

  15. arXiv:2308.13118  [pdf, other

    cs.LG

    Business Metric-Aware Forecasting for Inventory Management

    Authors: Helen Zhou, Sercan O. Arik, **gtao Wang

    Abstract: Time-series forecasts play a critical role in business planning. However, forecasters typically optimize objectives that are agnostic to downstream business goals and thus can produce forecasts misaligned with business preferences. In this work, we demonstrate that optimization of conventional forecasting metrics can often lead to sub-optimal downstream business performance. Focusing on the invent… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  16. arXiv:2306.00739  [pdf, other

    cs.CL cs.AI cs.DB

    SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended)

    Authors: Ruoxi Sun, Sercan Ö. Arik, Alex Muzio, Lesly Miculicich, Satya Gundabathula, Pengcheng Yin, Hanjun Dai, Hootan Nakhost, Rajarishi Sinha, Zifeng Wang, Tomas Pfister

    Abstract: Text-to-SQL, the process of translating natural language into Structured Query Language (SQL), represents a transformative application of large language models (LLMs), potentially revolutionizing how humans interact with data. This paper introduces the SQL-PaLM framework, a comprehensive solution for understanding and enhancing Text-to-SQL using LLMs, using in the learning regimes of few-shot prom… ▽ More

    Submitted 30 March, 2024; v1 submitted 26 May, 2023; originally announced June 2023.

  17. arXiv:2305.16556  [pdf, other

    cs.LG cs.AI

    LANISTR: Multimodal Learning from Structured and Unstructured Data

    Authors: Sayna Ebrahimi, Sercan O. Arik, Yihe Dong, Tomas Pfister

    Abstract: Multimodal large-scale pretraining has shown impressive performance for unstructured data such as language and image. However, a prevalent real-world scenario involves structured data types, tabular and time-series, along with unstructured data. Such scenarios have been understudied. To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured… ▽ More

    Submitted 24 April, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

  18. arXiv:2305.14926  [pdf, other

    cs.CL cs.AI cs.LG

    Universal Self-Adaptive Prompting

    Authors: Xingchen Wan, Ruoxi Sun, Hootan Nakhost, Hanjun Dai, Julian Martin Eisenschlos, Sercan O. Arik, Tomas Pfister

    Abstract: A hallmark of modern large language models (LLMs) is their impressive general zero-shot and few-shot abilities, often elicited through in-context learning (ICL) via prompting. However, while highly coveted and being the most general, zero-shot performances in LLMs are still typically weaker due to the lack of guidance and the difficulty of applying existing automatic prompt design methods in gener… ▽ More

    Submitted 20 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 (Main). 10 pages, 5 figures, 4 tables (26 pages, 9 figures and 13 tables including references and appendices)

  19. arXiv:2305.14106  [pdf, other

    cs.CL cs.AI cs.LG

    Better Zero-Shot Reasoning with Self-Adaptive Prompting

    Authors: Xingchen Wan, Ruoxi Sun, Hanjun Dai, Sercan O. Arik, Tomas Pfister

    Abstract: Modern large language models (LLMs) have demonstrated impressive capabilities at sophisticated tasks, often through step-by-step reasoning similar to humans. This is made possible by their strong few and zero-shot abilities -- they can effectively learn from a handful of handcrafted, completed responses ("in-context examples"), or are prompted to reason spontaneously through specially designed tri… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Findings of the Association for Computational Linguistics: ACL 2023. 10 pages, 2 tables, 4 figures (20 pages, 8 tables, 7 figures including references and appendices)

  20. arXiv:2304.03870  [pdf, other

    cs.LG

    ASPEST: Bridging the Gap Between Active Learning and Selective Prediction

    Authors: Jiefeng Chen, **sung Yoon, Sayna Ebrahimi, Sercan Arik, Somesh Jha, Tomas Pfister

    Abstract: Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. These predictions can then be deferred to humans for further evaluation. As an everlasting challenge for machine learning, in many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, and often increased dependenc… ▽ More

    Submitted 29 February, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

  21. arXiv:2304.03202  [pdf, other

    cs.LG

    SLM: End-to-end Feature Selection via Sparse Learnable Masks

    Authors: Yihe Dong, Sercan O. Arik

    Abstract: Feature selection has been widely used to alleviate compute requirements during training, elucidate model interpretability, and improve model generalizability. We propose SLM -- Sparse Learnable Masks -- a canonical approach for end-to-end feature selection that scales well with respect to both the feature dimension and the number of samples. At the heart of SLM lies a simple but effective learnab… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

  22. arXiv:2303.06053  [pdf, other

    cs.LG cs.AI

    TSMixer: An All-MLP Architecture for Time Series Forecasting

    Authors: Si-An Chen, Chun-Liang Li, Nate Yoder, Sercan O. Arik, Tomas Pfister

    Abstract: Real-world time-series datasets are often multivariate with complex dynamics. To capture this complexity, high capacity architectures like recurrent- or attention-based sequential deep learning models have become popular. However, recent work demonstrates that simple univariate linear models can outperform such deep learning models on several commonly used academic benchmarks. Extending them, in t… ▽ More

    Submitted 11 September, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Journal ref: Transactions on Machine Learning Research (TMLR), 09/2023

  23. arXiv:2301.04857  [pdf, other

    cs.AI stat.ME

    Neural Spline Search for Quantile Probabilistic Modeling

    Authors: Ruoxi Sun, Chun-Liang Li, Sercan O. Arik, Michael W. Dusenberry, Chen-Yu Lee, Tomas Pfister

    Abstract: Accurate estimation of output quantiles is crucial in many use cases, where it is desired to model the range of possibility. Modeling target distribution at arbitrary quantile levels and at arbitrary input attribute levels are important to offer a comprehensive picture of the data, and requires the quantile function to be expressive enough. The quantile function describing the target distribution… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

  24. arXiv:2212.00173  [pdf, other

    cs.LG

    SPADE: Semi-supervised Anomaly Detection under Distribution Mismatch

    Authors: **sung Yoon, Kihyuk Sohn, Chun-Liang Li, Sercan O. Arik, Tomas Pfister

    Abstract: Semi-supervised anomaly detection is a common problem, as often the datasets containing anomalies are partially labeled. We propose a canonical framework: Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE) that isn't limited by the assumption that labeled and unlabeled data come from the same distribution. Indeed, the assumption is often violated in many applications - for ex… ▽ More

    Submitted 30 November, 2022; originally announced December 2022.

  25. arXiv:2211.06582  [pdf, other

    cs.LG cs.CR stat.ML

    Provable Membership Inference Privacy

    Authors: Zachary Izzo, **sung Yoon, Sercan O. Arik, James Zou

    Abstract: In applications involving sensitive data, such as finance and healthcare, the necessity for preserving data privacy can be a significant barrier to machine learning model development. Differential privacy (DP) has emerged as one canonical standard for provable privacy. However, DP's strong theoretical guarantees often come at the cost of a large drop in its utility for machine learning, and DP gua… ▽ More

    Submitted 12 November, 2022; originally announced November 2022.

    Comments: 19 pages, 2 figures

  26. arXiv:2210.03675  [pdf, other

    cs.LG stat.ML

    Koopman Neural Forecaster for Time Series with Temporal Distribution Shifts

    Authors: Rui Wang, Yihe Dong, Sercan Ö. Arik, Rose Yu

    Abstract: Temporal distributional shifts, with underlying dynamics changing over time, frequently occur in real-world time series and pose a fundamental challenge for deep neural networks (DNNs). In this paper, we propose a novel deep sequence model based on the Koopman theory for time series forecasting: Koopman Neural Forecaster (KNF) which leverages DNNs to learn the linear Koopman space and the coeffici… ▽ More

    Submitted 28 February, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

  27. arXiv:2209.07999  [pdf, other

    cs.LG cs.AI cs.CV cs.IT eess.IV

    Self-Supervised Learning with an Information Maximization Criterion

    Authors: Serdar Ozsoy, Shadi Hamdan, Sercan Ö. Arik, Deniz Yuret, Alper T. Erdogan

    Abstract: Self-supervised learning allows AI systems to learn effective representations from large amounts of data using tasks that do not require costly labeling. Mode collapse, i.e., the model producing identical representations for all inputs, is a central problem to many self-supervised learning approaches, making self-supervised tasks, such as matching distorted variants of the inputs, ineffective. In… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

    ACM Class: I.2; I.4; I.5

  28. arXiv:2206.07240  [pdf, other

    cs.CV cs.AI cs.LG

    Test-Time Adaptation for Visual Document Understanding

    Authors: Sayna Ebrahimi, Sercan O. Arik, Tomas Pfister

    Abstract: For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area. We propose DocTTA, a novel test-time adaptation method for documents, that does source-free domain adaptation using unlabeled target document… ▽ More

    Submitted 23 August, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Accepted at TMLR 2023

  29. arXiv:2206.06469  [pdf

    cs.LG stat.ML

    Invariant Structure Learning for Better Generalization and Causal Explainability

    Authors: Yunhao Ge, Sercan Ö. Arik, **sung Yoon, Ao Xu, Laurent Itti, Tomas Pfister

    Abstract: Learning the causal structure behind data is invaluable for improving generalization and obtaining high-quality explanations. We propose a novel framework, Invariant Structure Learning (ISL), that is designed to improve causal structure discovery by utilizing generalization as an indication. ISL splits the data into different environments, and learns a structure that is invariant to the target acr… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: 16 pages (including Appendix), 4 figures

  30. arXiv:2206.02107  [pdf, other

    cs.LG

    Interpretable Mixture of Experts

    Authors: Aya Abdelsalam Ismail, Sercan Ö. Arik, **sung Yoon, Ankur Taly, Soheil Feizi, Tomas Pfister

    Abstract: The need for reliable model explanations is prominent for many machine learning applications, particularly for tabular and time-series data as their use cases often involve high-stakes decision making. Towards this goal, we introduce a novel interpretable modeling framework, Interpretable Mixture of Experts (IME), that yields high accuracy, comparable to `black-box' Deep Neural Networks (DNNs) in… ▽ More

    Submitted 25 May, 2023; v1 submitted 5 June, 2022; originally announced June 2022.

  31. arXiv:2203.02034  [pdf, other

    cs.LG

    Data-Efficient and Interpretable Tabular Anomaly Detection

    Authors: Chun-Hao Chang, **sung Yoon, Sercan Arik, Madeleine Udell, Tomas Pfister

    Abstract: Anomaly detection (AD) plays an important role in numerous applications. We focus on two understudied aspects of AD that are critical for integration into real-world applications. First, most AD methods cannot incorporate labeled data that are often available in practice in small quantities and can be crucial to achieve high AD accuracy. Second, most AD methods are not interpretable, a bottleneck… ▽ More

    Submitted 4 June, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: Accepted in 2023 KDD

  32. arXiv:2202.02403  [pdf, other

    cs.LG cs.AI

    Self-Adaptive Forecasting for Improved Deep Learning on Non-Stationary Time-Series

    Authors: Sercan O. Arik, Nathanael C. Yoder, Tomas Pfister

    Abstract: Real-world time-series datasets often violate the assumptions of standard supervised learning for forecasting -- their distributions evolve over time, rendering the conventional training and model selection procedures suboptimal. In this paper, we propose a novel method, Self-Adaptive Forecasting (SAF), to modify the training of time-series forecasting models to improve their performance on foreca… ▽ More

    Submitted 26 September, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

  33. arXiv:2202.02262  [pdf, other

    cs.LG

    Decoupling Local and Global Representations of Time Series

    Authors: Sana Tonekaboni, Chun-Liang Li, Sercan Arik, Anna Goldenberg, Tomas Pfister

    Abstract: Real-world time series data are often generated from several sources of variation. Learning representations that capture the factors contributing to this variability enables a better understanding of the data via its underlying generative process and improves performance on downstream machine learning tasks. This paper proposes a novel generative approach for learning representations for the globa… ▽ More

    Submitted 11 February, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

  34. arXiv:2106.07804  [pdf, other

    cs.LG stat.ML

    Controlling Neural Networks with Rule Representations

    Authors: Sungyong Seo, Sercan O. Arik, **sung Yoon, Xiang Zhang, Kihyuk Sohn, Tomas Pfister

    Abstract: We propose a novel training method that integrates rules into deep learning, in a way the strengths of the rules are controllable at inference. Deep Neural Networks with Controllable Rule Representations (DeepCTRL) incorporates a rule encoder into the model coupled with a rule-based objective, enabling a shared representation for decision making. DeepCTRL is agnostic to data type and model archite… ▽ More

    Submitted 16 November, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS 2021)

  35. arXiv:2106.06115  [pdf, other

    cs.LG

    Self-supervise, Refine, Repeat: Improving Unsupervised Anomaly Detection

    Authors: **sung Yoon, Kihyuk Sohn, Chun-Liang Li, Sercan O. Arik, Chen-Yu Lee, Tomas Pfister

    Abstract: Anomaly detection (AD), separating anomalies from normal data, has many applications across domains, from security to healthcare. While most previous works were shown to be effective for cases with fully or partially labeled data, that setting is in practice less common due to labeling being particularly tedious for this task. In this paper, we focus on fully unsupervised AD, in which the entire t… ▽ More

    Submitted 4 August, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Published in Transactions on Machine Learning Research (TMLR) - August, 2022 - https://openreview.net/forum?id=b3v1UrtF6G

  36. arXiv:2105.12723  [pdf, other

    cs.CV

    Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

    Authors: Zizhao Zhang, Han Zhang, Long Zhao, Ting Chen, Sercan O. Arik, Tomas Pfister

    Abstract: Hierarchical structures are popular in recent vision transformers, however, they require sophisticated designs and massive datasets to work well. In this paper, we explore the idea of nesting basic local transformers on non-overlap** image blocks and aggregating them in a hierarchical way. We find that the block aggregation function plays a critical role in enabling cross-block non-local informa… ▽ More

    Submitted 30 December, 2021; v1 submitted 26 May, 2021; originally announced May 2021.

    Comments: AAAI2022

  37. arXiv:2008.00646  [pdf, other

    cs.LG stat.ML

    Interpretable Sequence Learning for COVID-19 Forecasting

    Authors: Sercan O. Arik, Chun-Liang Li, **sung Yoon, Rajarishi Sinha, Arkady Epshteyn, Long T. Le, Vikas Menon, Shashank Singh, Leyou Zhang, Nate Yoder, Martin Nikoltchev, Yash Sonthalia, Hootan Nakhost, Elli Kanal, Tomas Pfister

    Abstract: We propose a novel approach that integrates machine learning into compartmental disease modeling to predict the progression of COVID-19. Our model is explainable by design as it explicitly shows how different compartments evolve and it uses interpretable encoders to incorporate covariates and improve performance. Explainability is valuable to ensure that the model's forecasts are credible to epide… ▽ More

    Submitted 13 January, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

  38. arXiv:2007.07477  [pdf, other

    cs.CV cs.AI cs.LG

    Explaining Deep Neural Networks using Unsupervised Clustering

    Authors: Yu-han Liu, Sercan O. Arik

    Abstract: We propose a novel method to explain trained deep neural networks (DNNs), by distilling them into surrogate models using unsupervised clustering. Our method can be applied flexibly to any subset of layers of a DNN architecture and can incorporate low-level and high-level information. On image datasets given pre-trained DNNs, we demonstrate the strength of our method in finding similar training sam… ▽ More

    Submitted 15 July, 2020; v1 submitted 15 July, 2020; originally announced July 2020.

  39. arXiv:1912.09363  [pdf, other

    stat.ML cs.LG

    Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting

    Authors: Bryan Lim, Sercan O. Arik, Nicolas Loeff, Tomas Pfister

    Abstract: Multi-horizon forecasting problems often contain a complex mix of inputs -- including static (i.e. time-invariant) covariates, known future inputs, and other exogenous time series that are only observed historically -- without any prior information on how they interact with the target. While several deep learning models have been proposed for multi-step prediction, they typically comprise black-bo… ▽ More

    Submitted 27 September, 2020; v1 submitted 19 December, 2019; originally announced December 2019.

  40. arXiv:1912.01730  [pdf, other

    cs.LG stat.ML

    Distance-Based Learning from Errors for Confidence Calibration

    Authors: Chen Xing, Sercan Arik, Zizhao Zhang, Tomas Pfister

    Abstract: Deep neural networks (DNNs) are poorly calibrated when trained in conventional ways. To improve confidence calibration of DNNs, we propose a novel training method, distance-based learning from errors (DBLE). DBLE bases its confidence estimation on distances in the representation space. In DBLE, we first adapt prototypical learning to train classification models. It yields a representation space wh… ▽ More

    Submitted 17 February, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

  41. arXiv:1910.07969  [pdf, other

    cs.LG stat.ML

    On Completeness-aware Concept-Based Explanations in Deep Neural Networks

    Authors: Chih-Kuan Yeh, Been Kim, Sercan O. Arik, Chun-Liang Li, Tomas Pfister, Pradeep Ravikumar

    Abstract: Human explanations of high-level decisions are often expressed in terms of key concepts the decisions are based on. In this paper, we study such concept-based explainability for Deep Neural Networks (DNNs). First, we define the notion of completeness, which quantifies how sufficient a particular set of concepts is in explaining a model's prediction behavior based on the assumption that complete co… ▽ More

    Submitted 7 February, 2022; v1 submitted 17 October, 2019; originally announced October 2019.

    Comments: Updated supplementary

  42. arXiv:1910.07153  [pdf, other

    cs.LG cs.CV

    Consistency-based Semi-supervised Active Learning: Towards Minimizing Labeling Cost

    Authors: Mingfei Gao, Zizhao Zhang, Guo Yu, Sercan O. Arik, Larry S. Davis, Tomas Pfister

    Abstract: Active learning (AL) combines data labeling and model training to minimize the labeling cost by prioritizing the selection of high value data that can best improve model performance. In pool-based active learning, accessible unlabeled data are not used for model training in most conventional methods. Here, we propose to unify unlabeled sample selection and model training towards minimizing labelin… ▽ More

    Submitted 18 July, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

    Comments: Accepted by ECCV2020

  43. arXiv:1910.00701  [pdf, other

    cs.LG cs.CV stat.ML

    Distilling Effective Supervision from Severe Label Noise

    Authors: Zizhao Zhang, Han Zhang, Sercan O. Arik, Honglak Lee, Tomas Pfister

    Abstract: Collecting large-scale data with clean labels for supervised training of neural networks is practically challenging. Although noisy labels are usually cheap to acquire, existing methods suffer a lot from label noise. This paper targets at the challenge of robust training at high label noise regimes. The key insight to achieve this goal is to wisely leverage a small trusted set to estimate exemplar… ▽ More

    Submitted 12 June, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: CVPR2020

  44. arXiv:1909.12367  [pdf, other

    cs.LG stat.ML

    LIMIS: Locally Interpretable Modeling using Instance-wise Subsampling

    Authors: **sung Yoon, Sercan O. Arik, Tomas Pfister

    Abstract: Understanding black-box machine learning models is crucial for their widespread adoption. Learning globally interpretable models is one approach, but achieving high performance with them is challenging. An alternative approach is to explain individual predictions using locally interpretable models. For locally interpretable modeling, various methods have been proposed and indeed commonly used, but… ▽ More

    Submitted 21 September, 2022; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: Published in Transactions on Machine Learning Research (TMLR) - September, 2022 - https://openreview.net/forum?id=S8eABAy8P3

  45. arXiv:1909.11671  [pdf, other

    cs.LG stat.ML

    Data Valuation using Reinforcement Learning

    Authors: **sung Yoon, Sercan O. Arik, Tomas Pfister

    Abstract: Quantifying the value of data is a fundamental problem in machine learning. Data valuation has multiple important use cases: (1) building insights about the learning task, (2) domain adaptation, (3) corrupted sample discovery, and (4) robust learning. To adaptively learn data values jointly with the target task predictor model, we propose a meta learning framework which we name Data Valuation usin… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

    Comments: 17 pages, 12 figures, 6 tables

  46. arXiv:1908.11406  [pdf, other

    cs.LG cs.AI

    Learning to Transfer Learn: Reinforcement Learning-Based Selection for Adaptive Transfer Learning

    Authors: Linchao Zhu, Sercan O. Arik, Yi Yang, Tomas Pfister

    Abstract: We propose a novel adaptive transfer learning framework, learning to transfer learn (L2TL), to improve performance on a target dataset by careful extraction of the related information from a source dataset. Our framework considers cooperative optimization of shared weights between models for source and target tasks, and adjusts the constituent loss weights adaptively. The adaptation of the weights… ▽ More

    Submitted 16 July, 2020; v1 submitted 29 August, 2019; originally announced August 2019.

  47. arXiv:1908.07442  [pdf, other

    cs.LG stat.ML

    TabNet: Attentive Interpretable Tabular Learning

    Authors: Sercan O. Arik, Tomas Pfister

    Abstract: We propose a novel high-performance and interpretable canonical deep tabular data learning architecture, TabNet. TabNet uses sequential attention to choose which features to reason from at each decision step, enabling interpretability and more efficient learning as the learning capacity is used for the most salient features. We demonstrate that TabNet outperforms other neural network and decision… ▽ More

    Submitted 9 December, 2020; v1 submitted 20 August, 2019; originally announced August 2019.

  48. arXiv:1907.04648  [pdf, other

    cs.LG

    EPNAS: Efficient Progressive Neural Architecture Search

    Authors: Yanqi Zhou, Peng Wang, Sercan Arik, Haonan Yu, Syed Zawad, Feng Yan, Greg Diamos

    Abstract: In this paper, we propose Efficient Progressive Neural Architecture Search (EPNAS), a neural architecture search (NAS) that efficiently handles large search space through a novel progressive search policy with performance prediction based on REINFORCE~\cite{Williams.1992.PG}. EPNAS is designed to search target networks in parallel, which is more scalable on parallel systems such as GPU/TPU cluster… ▽ More

    Submitted 6 July, 2019; originally announced July 2019.

  49. arXiv:1902.06292  [pdf, other

    cs.LG cs.CV

    ProtoAttend: Attention-Based Prototypical Learning

    Authors: Sercan O. Arik, Tomas Pfister

    Abstract: We propose a novel inherently interpretable machine learning method that bases decisions on few relevant examples that we call prototypes. Our method, ProtoAttend, can be integrated into a wide range of neural network architectures including pre-trained models. It utilizes an attention mechanism that relates the encoded representations to samples in order to determine prototypes. The resulting mod… ▽ More

    Submitted 25 September, 2019; v1 submitted 17 February, 2019; originally announced February 2019.

  50. arXiv:1808.06719  [pdf, other

    cs.SD cs.LG eess.AS

    Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

    Authors: Sercan O. Arik, Heewoo Jun, Gregory Diamos

    Abstract: We propose the multi-head convolutional neural network (MCNN) architecture for waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is employed with transposed convolution layers in parallel heads. MCNN achieves more than an order of magnitude higher compute intensity than commonly-used iterative algorithms like Griffin-Lim, yielding efficient utilization for modern multi-core pro… ▽ More

    Submitted 5 November, 2018; v1 submitted 20 August, 2018; originally announced August 2018.