Skip to main content

Showing 1–30 of 30 results for author: Chiang, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17741  [pdf, other

    cs.CV cs.AI

    Point-SAM: Promptable 3D Segmentation Model for Point Clouds

    Authors: Yuchen Zhou, Jiayuan Gu, Tung Yen Chiang, Fanbo Xiang, Hao Su

    Abstract: The development of 2D foundation models for image segmentation has been significantly advanced by the Segment Anything Model (SAM). However, achieving similar success in 3D models remains a challenge due to issues such as non-unified data formats, lightweight models, and the scarcity of labeled data with diverse masks. To this end, we propose a 3D promptable segmentation model (Point-SAM) focusing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.08307  [pdf, other

    stat.ML cs.LG

    Measuring model variability using robust non-parametric testing

    Authors: Sin**i Banerjee, Tim Marrinan, Reilly Cannon, Tony Chiang, Anand D. Sarwate

    Abstract: Training a deep neural network often involves stochastic optimization, meaning each run will produce a different model. The seed used to initialize random elements of the optimization procedure heavily influences the quality of a trained model, which may be obscure from many commonly reported summary statistics, like accuracy. However, random seed is often not included in hyper-parameter optimizat… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2402.10424  [pdf, other

    cs.CL cs.AI

    Understanding In-Context Learning with a Pelican Soup Framework

    Authors: Ting-Rui Chiang, Dani Yogatama

    Abstract: Many existing theoretical analyses of in-context learning for natural language processing are based on latent variable models that leaves gaps between theory and practice. We aim to close these gaps by proposing a theoretical framework, the Pelican Soup Framework. In this framework, we introduce (1) the notion of a common sense knowledge base, (2) a general formalism for natural language classific… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  4. arXiv:2311.09615  [pdf, other

    cs.CL

    On Retrieval Augmentation and the Limitations of Language Model Training

    Authors: Ting-Rui Chiang, Xinyan Velocity Yu, Joshua Robinson, Ollie Liu, Isabelle Lee, Dani Yogatama

    Abstract: Augmenting a language model (LM) with $k$-nearest neighbors ($k$NN) retrieval on its training data alone can decrease its perplexity, though the underlying reasons for this remain elusive. In this work, we rule out one previously posited possibility -- the "softmax bottleneck." We then create a new dataset to evaluate LM generalization ability in the setting where training data contains additional… ▽ More

    Submitted 2 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024

  5. arXiv:2310.18612  [pdf, other

    cs.LG

    Efficient kernel surrogates for neural network-based regression

    Authors: Saad Qadeer, Andrew Engel, Amanda Howard, Adam Tsou, Max Vargas, Panos Stinis, Tony Chiang

    Abstract: Despite their immense promise in performing a variety of learning tasks, a theoretical understanding of the limitations of Deep Neural Networks (DNNs) has so far eluded practitioners. This is partly due to the inability to determine the closed forms of the learned functions, making it harder to study their generalization properties on unseen datasets. Recent work has shown that randomly initialize… ▽ More

    Submitted 24 January, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: 35 pages. software used to reach results available upon request, approved for release by Pacific Northwest National Laboratory

    Report number: PNNL-SA-191858 MSC Class: 68T07; 65M99

  6. arXiv:2310.16261  [pdf, other

    cs.CL

    The Distributional Hypothesis Does Not Fully Explain the Benefits of Masked Language Model Pretraining

    Authors: Ting-Rui Chiang, Dani Yogatama

    Abstract: We analyze the masked language modeling pretraining objective function from the perspective of the distributional hypothesis. We investigate whether better sample efficiency and the better generalization capability of models pretrained with masked language modeling can be attributed to the semantic similarity encoded in the pretraining data's distributional property. Via a synthetic dataset, our a… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

  7. arXiv:2310.13836  [pdf, other

    cs.LG cs.CL

    Foundation Model's Embedded Representations May Detect Distribution Shift

    Authors: Max Vargas, Adam Tsou, Andrew Engel, Tony Chiang

    Abstract: Sampling biases can cause distribution shifts between train and test datasets for supervised learning tasks, obscuring our ability to understand the generalization capacity of a model. This is especially important considering the wide adoption of pre-trained foundational neural networks -- whose behavior remains poorly understood -- for transfer learning (TL) tasks. We present a case study for TL… ▽ More

    Submitted 2 February, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: 17 pages, 8 figures, 5 tables

  8. arXiv:2310.07724  [pdf, other

    cs.RO cs.AI cs.LG

    Visual Forecasting as a Mid-level Representation for Avoidance

    Authors: Hsuan-Kung Yang, Tsung-Chih Chiang, Ting-Ru Liu, Chun-Wei Huang, Jou-Min Liu, Chun-Yi Lee

    Abstract: The challenge of navigation in environments with dynamic objects continues to be a central issue in the study of autonomous agents. While predictive methods hold promise, their reliance on precise state information makes them less practical for real-world implementation. This study presents visual forecasting as an innovative alternative. By introducing intuitive visual cues, this approach project… ▽ More

    Submitted 17 September, 2023; originally announced October 2023.

    Comments: Tsung-Chih Chiang, Ting-Ru Liu, Chun-Wei Huang, and Jou-Min Liu contributed equally to this work; This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  9. arXiv:2310.00541  [pdf, other

    stat.ML cs.LG

    Robust Nonparametric Hypothesis Testing to Understand Variability in Training Neural Networks

    Authors: Sin**i Banerjee, Reilly Cannon, Tim Marrinan, Tony Chiang, Anand D. Sarwate

    Abstract: Training a deep neural network (DNN) often involves stochastic optimization, which means each run will produce a different model. Several works suggest this variability is negligible when models have the same performance, which in the case of classification is test accuracy. However, models with similar test accuracy may not be computing the same function. We propose a new measure of closeness bet… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

  10. arXiv:2309.15328  [pdf, other

    cs.LG

    Exploring Learned Representations of Neural Networks with Principal Component Analysis

    Authors: Amit Harlev, Andrew Engel, Panos Stinis, Tony Chiang

    Abstract: Understanding feature representation for deep neural networks (DNNs) remains an open question within the general field of explainable AI. We use principal component analysis (PCA) to study the performance of a k-nearest neighbors classifier (k-NN), nearest class-centers classifier (NCC), and support vector machines on the learned layer-wise representations of a ResNet-18 trained on CIFAR-10. We sh… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 5 pages, 3 figures

  11. arXiv:2307.11684  [pdf, other

    cs.LG

    Minibatching Offers Improved Generalization Performance for Second Order Optimizers

    Authors: Eric Silk, Swarnita Chakraborty, Nairanjana Dasgupta, Anand D. Sarwate, Andrew Lumsdaine, Tony Chiang

    Abstract: Training deep neural networks (DNNs) used in modern machine learning is computationally expensive. Machine learning scientists, therefore, rely on stochastic first-order methods for training, coupled with significant hand-tuning, to obtain good performance. To better understand performance variability of different stochastic algorithms, including second-order methods, we conduct an empirical study… ▽ More

    Submitted 25 May, 2023; originally announced July 2023.

    Comments: 14 pages, 6 figures, 5 tables

  12. arXiv:2305.14585  [pdf, other

    cs.LG

    Faithful and Efficient Explanations for Neural Networks via Neural Tangent Kernel Surrogate Models

    Authors: Andrew Engel, Zhichao Wang, Natalie S. Frank, Ioana Dumitriu, Sutanay Choudhury, Anand Sarwate, Tony Chiang

    Abstract: A recent trend in explainable AI research has focused on surrogate modeling, where neural networks are approximated as simpler ML algorithms such as kernel machines. A second trend has been to utilize kernel functions in various explain-by-example or data attribution tasks. In this work, we combine these two trends to analyze approximate empirical neural tangent kernels (eNTK) for data attribution… ▽ More

    Submitted 11 March, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 9 pages, 2 figures, 3 tables Updated 3/11/2024 various additions/clarifications after ICLR review. Accepted as a Spotlight paper at ICLR 2024

  13. arXiv:2303.02731  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Virtual Guidance as a Mid-level Representation for Navigation

    Authors: Hsuan-Kung Yang, Tsung-Chih Chiang, Ting-Ru Liu, Chun-Wei Huang, Jou-Min Liu, Chun-Yi Lee

    Abstract: In the context of autonomous navigation, effectively conveying abstract navigational cues to agents in dynamic environments poses challenges, particularly when the navigation information is multimodal. To address this issue, the paper introduces a novel technique termed "Virtual Guidance," which is designed to visually represent non-visual instructional signals. These visual cues, rendered as colo… ▽ More

    Submitted 17 September, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Comments: Tsung-Chih Chiang, Ting-Ru Liu, Chun-Wei Huang, and Jou-Min Liu contributed equally to this work; This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  14. arXiv:2212.14801  [pdf, other

    cs.CV

    ExReg: Wide-range Photo Exposure Correction via a Multi-dimensional Regressor with Attention

    Authors: Tzu-Hao Chiang, Hao-Chien Hsueh, Ching-Chun Hsiao, Ching-Chun Huang

    Abstract: Photo exposure correction is widely investigated, but fewer studies focus on correcting under and over-exposed images simultaneously. Three issues remain open to handle and correct under and over-exposed images in a unified way. First, a locally-adaptive exposure adjustment may be more flexible instead of learning a global map**. Second, it is an ill-posed problem to determine the suitable expos… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: 12 pages, 8 figures

  15. arXiv:2211.06506  [pdf, other

    cs.LG stat.ML

    Spectral Evolution and Invariance in Linear-width Neural Networks

    Authors: Zhichao Wang, Andrew Engel, Anand Sarwate, Ioana Dumitriu, Tony Chiang

    Abstract: We investigate the spectral properties of linear-width feed-forward neural networks, where the sample size is asymptotically proportional to network width. Empirically, we show that the spectra of weight in this high dimensional regime are invariant when trained by gradient descent for small constant learning rates; we provide a theoretical justification for this observation and prove the invarian… ▽ More

    Submitted 7 November, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: Accepted by NeurIPS 2023

  16. arXiv:2207.12551  [pdf, other

    cs.CL

    DialCrowd 2.0: A Quality-Focused Dialog System Crowdsourcing Toolkit

    Authors: Jessica Huynh, Ting-Rui Chiang, Jeffrey Bigham, Maxine Eskenazi

    Abstract: Dialog system developers need high-quality data to train, fine-tune and assess their systems. They often use crowdsourcing for this since it provides large quantities of data from many workers. However, the data may not be of sufficiently good quality. This can be due to the way that the requester presents a task and how they interact with the workers. This paper introduces DialCrowd 2.0 to help r… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: Published at LREC 2022

  17. arXiv:2205.12372  [pdf, other

    cs.LG

    TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch Models

    Authors: Andrew Engel, Zhichao Wang, Anand D. Sarwate, Sutanay Choudhury, Tony Chiang

    Abstract: We introduce torchNTK, a python library to calculate the empirical neural tangent kernel (NTK) of neural network models in the PyTorch framework. We provide an efficient method to calculate the NTK of multilayer perceptrons. We compare the explicit differentiation implementation against autodifferentiation implementations, which have the benefit of extending the utility of the library to any archi… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: 19 pages, 5 figures

  18. arXiv:2110.08130  [pdf, other

    cs.CL

    Breaking Down Multilingual Machine Translation

    Authors: Ting-Rui Chiang, Yi-Pei Chen, Yi-Ting Yeh, Graham Neubig

    Abstract: While multilingual training is now an essential ingredient in machine translation (MT) systems, recent work has demonstrated that it has different effects in different multilingual settings, such as many-to-one, one-to-many, and many-to-many learning. These training settings expose the encoder and the decoder in a machine translation model with different data distributions. In this paper, we exami… ▽ More

    Submitted 3 April, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: ACL 2022 Findings

  19. arXiv:2110.05665  [pdf, other

    cs.CL

    Are you doing what I say? On modalities alignment in ALFRED

    Authors: Ting-Rui Chiang, Yi-Ting Yeh, Ta-Chung Chi, Yau-Shian Wang

    Abstract: ALFRED is a recently proposed benchmark that requires a model to complete tasks in simulated house environments specified by instructions in natural language. We hypothesize that key to success is accurately aligning the text modality with visual inputs. Motivated by this, we inspect how well existing models can align these modalities using our proposed intrinsic metric, boundary adherence score (… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted by Novel Ideas in Learning-to-Learn through Interaction at EMNLP 2021

  20. arXiv:2110.05301  [pdf, other

    cs.CL

    On a Benefit of Mask Language Modeling: Robustness to Simplicity Bias

    Authors: Ting-Rui Chiang

    Abstract: Despite the success of pretrained masked language models (MLM), why MLM pretraining is useful is still a qeustion not fully answered. In this work we theoretically and empirically show that MLM pretraining makes models robust to lexicon-level spurious features, partly answer the question. We theoretically show that, when we can model the distribution of a spurious feature $Π$ conditioned on the co… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Work in progress

  21. arXiv:2109.14144  [pdf, other

    cs.CL cs.LG

    Improving Dialogue State Tracking by Joint Slot Modeling

    Authors: Ting-Rui Chiang, Yi-Ting Yeh

    Abstract: Dialogue state tracking models play an important role in a task-oriented dialogue system. However, most of them model the slot types conditionally independently given the input. We discover that it may cause the model to be confused by slot types that share the same data type. To mitigate this issue, we propose TripPy-MRF and TripPy-LSTM that models the slots jointly. Our results show that they ar… ▽ More

    Submitted 14 November, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

    Comments: Accepted to the 3rd Workshop on NLP for ConvAI in EMNLP 2021

  22. arXiv:2109.08705  [pdf, other

    cs.CL cs.LG

    Relating Neural Text Degeneration to Exposure Bias

    Authors: Ting-Rui Chiang, Yun-Nung Chen

    Abstract: This work focuses on relating two mysteries in neural-based text generation: exposure bias, and text degeneration. Despite the long time since exposure bias was mentioned and the numerous studies for its remedy, to our knowledge, its impact on text generation has not yet been verified. Text degeneration is a problem that the widely-used pre-trained language model GPT-2 was recently found to suffer… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: Accepted by BlackBoxNLP at EMNLP 2021

  23. arXiv:2106.07137  [pdf, other

    cs.CL

    Why Can You Lay Off Heads? Investigating How BERT Heads Transfer

    Authors: Ting-Rui Chiang, Yun-Nung Chen

    Abstract: The huge size of the widely used BERT family models has led to recent efforts about model distillation. The main goal of distillation is to create a task-agnostic pre-trained model that can be fine-tuned on downstream tasks without fine-tuning its full-sized version. Despite the progress of distillation, to what degree and for what reason a task-agnostic model can be created from distillation has… ▽ More

    Submitted 13 June, 2021; originally announced June 2021.

  24. Deriving monadic quicksort (Declarative Pearl)

    Authors: Shin-Cheng Mu, Tsung-Ju Chiang

    Abstract: To demonstrate derivation of monadic programs, we present a specification of sorting using the non-determinism monad, and derive pure quicksort on lists and state-monadic quicksort on arrays. In the derivation one may switch between point-free and pointwise styles, and deploy techniques familiar to functional programmers such as pattern matching and induction on structures or on sizes. Derivation… ▽ More

    Submitted 27 January, 2021; originally announced January 2021.

    Journal ref: In Nakano K., Sagonas K. (eds) Functional and Logic Programming (FLOPS 2020). LNCS 12073. pp 124-138. 2020

  25. Longest segment of balanced parentheses -- an exercise in program inversion in a segment problem (Functional Pearl)

    Authors: Shin-Cheng Mu, Tsung-Ju Chiang

    Abstract: Given a string of parentheses, the task is to find the longest consecutive segment that is balanced, in linear time. We find this problem interesting because it involves a combination of techniques: the usual approach for solving segment problems, and a theorem for constructing the inverse of a function -- through which we derive an instance of shift-reduce parsing.

    Submitted 21 August, 2021; v1 submitted 24 January, 2021; originally announced January 2021.

    Journal ref: Journal of Functional Programming , Volume 31 , 2021 , e31

  26. arXiv:1909.10743  [pdf, other

    cs.CL

    An Empirical Study of Content Understanding in Conversational Question Answering

    Authors: Ting-Rui Chiang, Hao-Tong Ye, Yun-Nung Chen

    Abstract: With a lot of work about context-free question answering systems, there is an emerging trend of conversational question answering models in the natural language processing field. Thanks to the recently collected datasets, including QuAC and CoQA, there has been more work on conversational question answering, and recent work has achieved competitive performance on both datasets. However, to best of… ▽ More

    Submitted 27 November, 2019; v1 submitted 24 September, 2019; originally announced September 2019.

    Comments: Published at AAAI 2020

  27. arXiv:1903.08953  [pdf, other

    cs.CL

    Learning Multi-Level Information for Dialogue Response Selection by Highway Recurrent Transformer

    Authors: Ting-Rui Chiang, Chao-Wei Huang, Shang-Yu Su, Yun-Nung Chen

    Abstract: With the increasing research interest in dialogue response generation, there is an emerging branch formulating this task as selecting next sentences, where given the partial dialogue contexts, the goal is to determine the most probable next sentence. Following the recent success of the Transformer model, this paper proposes (1) a new variant of attention mechanism based on multi-head attention, ca… ▽ More

    Submitted 21 March, 2019; originally announced March 2019.

  28. arXiv:1903.08905  [pdf, other

    cs.CL

    RAP-Net: Recurrent Attention Pooling Networks for Dialogue Response Selection

    Authors: Chao-Wei Huang, Ting-Rui Chiang, Shang-Yu Su, Yun-Nung Chen

    Abstract: The response selection has been an emerging research topic due to the growing interest in dialogue modeling, where the goal of the task is to select an appropriate response for continuing dialogues. To further push the end-to-end dialogue model toward real-world scenarios, the seventh Dialog System Technology Challenge (DSTC7) proposed a challenging track based on real chatlog datasets. The compet… ▽ More

    Submitted 21 March, 2019; originally announced March 2019.

  29. arXiv:1811.00720  [pdf, other

    cs.CL

    Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems

    Authors: Ting-Rui Chiang, Yun-Nung Chen

    Abstract: Solving math word problems is a challenging task that requires accurate natural language understanding to bridge natural language texts and math expressions. Motivated by the intuition about how human generates the equations given the problem texts, this paper presents a neural approach to automatically solve math word problems by operating symbols according to their semantic meanings in texts. Th… ▽ More

    Submitted 9 June, 2019; v1 submitted 1 November, 2018; originally announced November 2018.

  30. arXiv:cs/0508042   

    cs.HC

    OpenVanilla - A Non-Intrusive Plug-In Framework of Text Services

    Authors: Tien-chien Chiang, Deng-Liu, Kang-min Liu, Weizhong Yang, Pek-tiong Tan, Mengjuei Hsieh, Tsung-hsiang Chang, Wen-Lien Hsu

    Abstract: This paper has been withdrawn by the author, because it was merged into cs.HC/0508041

    Submitted 29 June, 2006; v1 submitted 4 August, 2005; originally announced August 2005.

    Comments: This paper has been withdrawn

    ACM Class: H.5.2