Skip to main content

Showing 1–38 of 38 results for author: Tsai, Y H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.13194  [pdf, other

    cs.CV

    KPConvX: Modernizing Kernel Point Convolution with Kernel Attention

    Authors: Hugues Thomas, Yao-Hung Hubert Tsai, Timothy D. Barfoot, Jian Zhang

    Abstract: In the field of deep point cloud understanding, KPConv is a unique architecture that uses kernel points to locate convolutional weights in space, instead of relying on Multi-Layer Perceptron (MLP) encodings. While it initially achieved success, it has since been surpassed by recent MLP networks that employ updated designs and training strategies. Building upon the kernel point principle, we presen… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  2. arXiv:2402.00251  [pdf, other

    cs.LG cs.AI cs.CL

    Efficient Non-Parametric Uncertainty Quantification for Black-Box Large Language Models and Decision Planning

    Authors: Yao-Hung Hubert Tsai, Walter Talbott, Jian Zhang

    Abstract: Step-by-step decision planning with large language models (LLMs) is gaining attention in AI agent development. This paper focuses on decision planning with uncertainty estimation to address the hallucination problem in language models. Existing approaches are either white-box or computationally demanding, limiting use of black-box proprietary LLMs within budgets. The paper's first contribution is… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  3. arXiv:2310.10143  [pdf, other

    stat.ML cs.LG

    An Empirical Study of Self-supervised Learning with Wasserstein Distance

    Authors: Makoto Yamada, Yuki Takezawa, Guillaume Houry, Kira Michaela Dusterwald, Deborah Sulem, Han Zhao, Yao-Hung Hubert Tsai

    Abstract: In this study, we delve into the problem of self-supervised learning (SSL) utilizing the 1-Wasserstein distance on a tree structure (a.k.a., Tree-Wasserstein distance (TWD)), where TWD is defined as the L1 distance between two tree-embedded vectors. In SSL methods, the cosine similarity is often utilized as an objective function; however, it has not been well studied when utilizing the Wasserstein… ▽ More

    Submitted 5 February, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  4. arXiv:2310.08669  [pdf, other

    cs.CV cs.RO

    Multimodal Large Language Model for Visual Navigation

    Authors: Yao-Hung Hubert Tsai, Vansh Dhar, Jialu Li, Bowen Zhang, Jian Zhang

    Abstract: Recent efforts to enable visual navigation using large language models have mainly focused on develo** complex prompt systems. These systems incorporate instructions, observations, and history into massive text prompts, which are then combined with pre-trained large language models to facilitate visual navigation. In contrast, our approach aims to fine-tune large language models for visual navig… ▽ More

    Submitted 6 November, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

  5. arXiv:2309.12479  [pdf, other

    cs.RO

    Human Following in Mobile Platforms with Person Re-Identification

    Authors: Mario Srouji, Yao-Hung Hubert Tsai, Hugues Thomas, Jian Zhang

    Abstract: Human following is a crucial feature of human-robot interaction, yet it poses numerous challenges to mobile agents in real-world scenarios. Some major hurdles are that the target person may be in a crowd, obstructed by others, or facing away from the agent. To tackle these challenges, we present a novel person re-identification module composed of three parts: a 360-degree visual registration, a ne… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  6. arXiv:2212.05923  [pdf, other

    cs.RO cs.LG

    Self-Supervised Object Goal Navigation with In-Situ Finetuning

    Authors: So Yeon Min, Yao-Hung Hubert Tsai, Wei Ding, Ali Farhadi, Ruslan Salakhutdinov, Yonatan Bisk, Jian Zhang

    Abstract: A household robot should be able to navigate to target objects without requiring users to first annotate everything in their home. Most current approaches to object navigation do not test on real robots and rely solely on reconstructed scans of houses and their expensively labeled semantic 3D meshes. In this work, our goal is to build an agent that builds self-supervised models of the world via ex… ▽ More

    Submitted 1 April, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

  7. arXiv:2210.12562  [pdf, other

    cs.LG

    Greedy Modality Selection via Approximate Submodular Maximization

    Authors: Runxiang Cheng, Gargi Balasubramaniam, Yifei He, Yao-Hung Hubert Tsai, Han Zhao

    Abstract: Multimodal learning considers learning from multi-modality data, aiming to fuse heterogeneous sources of information. However, it is not always feasible to leverage all available modalities due to memory constraints. Further, training on all the modalities may be inefficient when redundant information exists within data, such as different subsets of modalities providing similar performance. In lig… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: Uncertainty in Artificial Intelligence (UAI) 2022

  8. arXiv:2209.13156  [pdf, other

    cs.CV cs.AI

    Towards Multimodal Multitask Scene Understanding Models for Indoor Mobile Agents

    Authors: Yao-Hung Hubert Tsai, Hanlin Goh, Ali Farhadi, Jian Zhang

    Abstract: The perception system in personalized mobile agents requires develo** indoor scene understanding models, which can understand 3D geometries, capture objectiveness, analyze human behaviors, etc. Nonetheless, this direction has not been well-explored in comparison with models for outdoor environments (e.g., the autonomous driving system that includes pedestrian prediction, car detection, traffic s… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: Submitted to ICRA2023

  9. arXiv:2209.12343  [pdf, other

    cs.CV cs.LG

    Paraphrasing Is All You Need for Novel Object Captioning

    Authors: Cheng-Fu Yang, Yao-Hung Hubert Tsai, Wan-Cyuan Fan, Ruslan Salakhutdinov, Louis-Philippe Morency, Yu-Chiang Frank Wang

    Abstract: Novel object captioning (NOC) aims to describe images containing objects without observing their ground truth captions during training. Due to the absence of caption annotation, captioning models cannot be directly optimized via sequence-to-sequence training or CIDEr optimization. As a result, we present Paraphrasing-to-Captioning (P2C), a two-stage learning framework for NOC, which would heuristi… ▽ More

    Submitted 25 September, 2022; originally announced September 2022.

    Comments: Accepted at NeurIPS 2022

  10. arXiv:2207.10950  [pdf, other

    cs.CV

    Scale dependant layer for self-supervised nuclei encoding

    Authors: Peter Naylor, Yao-Hung Hubert Tsai, Marick LaƩ, Makoto Yamada

    Abstract: Recent developments in self-supervised learning give us the possibility to further reduce human intervention in multi-step pipelines where the focus evolves around particular objects of interest. In the present paper, the focus lays in the nuclei in histopathology images. In particular we aim at extracting cellular information in an unsupervised manner for a downstream task. As nuclei present them… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: 13 pages, 6 figures, 2 tables

    MSC Class: 68Uxx ACM Class: F.2.2; I.2.7

  11. arXiv:2202.06670  [pdf, other

    cs.LG cs.AI

    Learning Weakly-Supervised Contrastive Representations

    Authors: Yao-Hung Hubert Tsai, Tianqin Li, Weixin Liu, Peiyuan Liao, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: We argue that a form of the valuable information provided by the auxiliary information is its implied data clustering information. For instance, considering hashtags as auxiliary information, we can hypothesize that an Instagram image will be semantically more similar with the same hashtags. With this intuition, we present a two-stage weakly-supervised contrastive learning approach. The first stag… ▽ More

    Submitted 18 February, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: Published as ICLR 2022. arXiv admin note: substantial text overlap with arXiv:2106.02869

  12. arXiv:2202.05458  [pdf, other

    cs.LG

    Conditional Contrastive Learning with Kernel

    Authors: Yao-Hung Hubert Tsai, Tianqin Li, Martin Q. Ma, Han Zhao, Kun Zhang, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: Conditional contrastive learning frameworks consider the conditional sampling procedure that constructs positive or negative data pairs conditioned on specific variables. Fair contrastive learning constructs negative pairs, for example, from the same gender (conditioning on sensitive information), which in turn reduces undesirable information from the learned representations; weakly supervised con… ▽ More

    Submitted 15 March, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

  13. arXiv:2106.07447  [pdf, other

    cs.CL cs.AI cs.LG eess.AS

    HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

    Authors: Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed

    Abstract: Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound units have variable lengths with no explicit segmentation. To deal with these three problems, we propose the Hidden-Unit BERT (HuBERT) approach for… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

  14. arXiv:2106.02869  [pdf, other

    cs.LG

    Integrating Auxiliary Information in Self-supervised Learning

    Authors: Yao-Hung Hubert Tsai, Tianqin Li, Weixin Liu, Peiyuan Liao, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: This paper presents to integrate the auxiliary information (e.g., additional attributes for data such as the hashtags for Instagram images) in the self-supervised learning process. We first observe that the auxiliary information may bring us useful information about data structures: for instance, the Instagram images with the same hashtags can be semantically similar. Hence, to leverage the struct… ▽ More

    Submitted 5 June, 2021; originally announced June 2021.

  15. arXiv:2106.02866  [pdf, other

    cs.LG

    Conditional Contrastive Learning for Improving Fairness in Self-Supervised Learning

    Authors: Martin Q. Ma, Yao-Hung Hubert Tsai, Paul Pu Liang, Han Zhao, Kun Zhang, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Contrastive self-supervised learning (SSL) learns an embedding space that maps similar data pairs closer and dissimilar data pairs farther apart. Despite its success, one issue has been overlooked: the fairness aspect of representations learned using contrastive SSL. Without mitigation, contrastive SSL techniques can incorporate sensitive information such as gender or race and cause potentially un… ▽ More

    Submitted 27 June, 2022; v1 submitted 5 June, 2021; originally announced June 2021.

  16. arXiv:2104.13712  [pdf, other

    cs.LG

    A Note on Connecting Barlow Twins with Negative-Sample-Free Contrastive Learning

    Authors: Yao-Hung Hubert Tsai, Shaojie Bai, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: In this report, we relate the algorithmic design of Barlow Twins' method to the Hilbert-Schmidt Independence Criterion (HSIC), thus establishing it as a contrastive learning approach that is free of negative samples. Through this perspective, we argue that Barlow Twins (and thus the class of negative-sample-free contrastive learning methods) suggests a possibility to bridge the two major families… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

  17. arXiv:2103.11275  [pdf, other

    cs.LG cs.IT

    Self-supervised Representation Learning with Relative Predictive Coding

    Authors: Yao-Hung Hubert Tsai, Martin Q. Ma, Muqiao Yang, Han Zhao, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: This paper introduces Relative Predictive Coding (RPC), a new contrastive representation learning objective that maintains a good balance among training stability, minibatch size sensitivity, and downstream task performance. The key to the success of RPC is two-fold. First, RPC introduces the relative parameters to regularize the objective for boundedness and low variance. Second, RPC contains no… ▽ More

    Submitted 12 April, 2021; v1 submitted 20 March, 2021; originally announced March 2021.

  18. arXiv:2010.14543  [pdf, other

    cs.LG cs.CV cs.RO

    Unsupervised Domain Adaptation for Visual Navigation

    Authors: Shangda Li, Devendra Singh Chaplot, Yao-Hung Hubert Tsai, Yue Wu, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: Advances in visual navigation methods have led to intelligent embodied navigation agents capable of learning meaningful representations from raw RGB images and perform a wide variety of tasks involving structural and semantic reasoning. However, most learning-based navigation policies are trained and tested in simulation environments. In order for these policies to be practically useful, they need… ▽ More

    Submitted 12 November, 2020; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: Deep Reinforcement Learning Workshop at NeurIPS 2020. Camera Ready Version

  19. arXiv:2006.05576  [pdf, other

    cs.LG stat.ML

    Self-supervised Learning from a Multi-view Perspective

    Authors: Yao-Hung Hubert Tsai, Yue Wu, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: As a subset of unsupervised representation learning, self-supervised representation learning adopts self-defined signals as supervision and uses the learned representation for downstream tasks, such as object detection and image captioning. Many proposed approaches for self-supervised learning follow naturally a multi-view perspective, where the input (e.g., original images) and the self-supervise… ▽ More

    Submitted 22 March, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

  20. arXiv:2006.05553  [pdf, other

    cs.LG stat.ME stat.ML

    Neural Methods for Point-wise Dependency Estimation

    Authors: Yao-Hung Hubert Tsai, Han Zhao, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: Since its inception, the neural estimation of mutual information (MI) has demonstrated the empirical success of modeling expected dependency between high-dimensional random variables. However, MI is an aggregate statistic and cannot be used to measure point-wise dependency between different events. In this work, instead of estimating the expected dependency, we focus on estimating point-wise depen… ▽ More

    Submitted 14 October, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

  21. arXiv:2005.12123  [pdf, other

    stat.ML cs.LG

    Feature Robust Optimal Transport for High-dimensional Data

    Authors: Mathis Petrovich, Chao Liang, Ryoma Sato, Yanbin Liu, Yao-Hung Hubert Tsai, Linchao Zhu, Yi Yang, Ruslan Salakhutdinov, Makoto Yamada

    Abstract: Optimal transport is a machine learning problem with applications including distribution comparison, feature selection, and generative adversarial networks. In this paper, we propose feature-robust optimal transport (FROT) for high-dimensional data, which solves high-dimensional OT problems using feature selection to avoid the curse of dimensionality. Specifically, we find a transport plan with di… ▽ More

    Submitted 29 September, 2020; v1 submitted 25 May, 2020; originally announced May 2020.

  22. arXiv:2004.14198  [pdf, other

    cs.CL

    Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis

    Authors: Yao-Hung Hubert Tsai, Martin Q. Ma, Muqiao Yang, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: The human language can be expressed through multiple sources of information known as modalities, including tones of voice, facial gestures, and spoken language. Recent multimodal learning with strong performances on human-centric tasks such as sentiment analysis and emotion recognition are often black-box, with very limited interpretability. In this paper we propose Multimodal Routing, which dynam… ▽ More

    Submitted 5 October, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

  23. arXiv:2002.04764  [pdf, other

    cs.LG stat.ML

    Capsules with Inverted Dot-Product Attention Routing

    Authors: Yao-Hung Hubert Tsai, Nitish Srivastava, Hanlin Goh, Ruslan Salakhutdinov

    Abstract: We introduce a new routing algorithm for capsule networks, in which a child capsule is routed to a parent based only on agreement between the parent's state and the child's vote. The new mechanism 1) designs routing via inverted dot-product attention; 2) imposes Layer Normalization as normalization; and 3) replaces sequential iterative routing with concurrent iterative routing. When compared to pr… ▽ More

    Submitted 26 February, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

    Comments: ICLR 2020

  24. arXiv:1910.10202  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Complex Transformer: A Framework for Modeling Complex-Valued Sequence

    Authors: Muqiao Yang, Martin Q. Ma, Dongyu Li, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov

    Abstract: While deep learning has received a surge of interest in a variety of fields in recent years, major deep learning models barely use complex numbers. However, speech, signal and audio data are naturally complex-valued after Fourier Transform, and studies have shown a potentially richer representation of complex nets. In this paper, we propose a Complex Transformer, which incorporates the transformer… ▽ More

    Submitted 6 August, 2021; v1 submitted 22 October, 2019; originally announced October 2019.

  25. arXiv:1909.02373  [pdf, other

    stat.ML cs.LG

    LSMI-Sinkhorn: Semi-supervised Mutual Information Estimation with Optimal Transport

    Authors: Yanbin Liu, Makoto Yamada, Yao-Hung Hubert Tsai, Tam Le, Ruslan Salakhutdinov, Yi Yang

    Abstract: Estimating mutual information is an important statistics and machine learning problem. To estimate the mutual information from data, a common practice is preparing a set of paired samples $\{(\mathbf{x}_i,\mathbf{y}_i)\}_{i=1}^n \stackrel{\mathrm{i.i.d.}}{\sim} p(\mathbf{x},\mathbf{y})$. However, in many situations, it is difficult to obtain a large number of data pairs. To address this problem, w… ▽ More

    Submitted 27 June, 2021; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: ECML/PKDD 2021

  26. arXiv:1908.11775  [pdf, ps, other

    cs.LG stat.ML

    Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel

    Authors: Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: Transformer is a powerful architecture that achieves superior performance on various sequence learning tasks, including neural machine translation, language understanding, and sequence prediction. At the core of the Transformer is the attention mechanism, which concurrently processes all inputs in the streams. In this paper, we present a new formulation of attention via the lens of the kernel. To… ▽ More

    Submitted 11 November, 2019; v1 submitted 30 August, 2019; originally announced August 2019.

    Comments: EMNLP 2019

  27. arXiv:1907.06288  [pdf, other

    cs.LG stat.ML

    Learning Neural Networks with Adaptive Regularization

    Authors: Han Zhao, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov, Geoffrey J. Gordon

    Abstract: Feed-forward neural networks can be understood as a combination of an intermediate representation and a linear hypothesis. While most previous works aim to diversify the representations, we explore the complementary direction by performing an adaptive and data-dependent regularization motivated by the empirical Bayes method. Specifically, we propose to construct a matrix-variate normal prior (on w… ▽ More

    Submitted 23 October, 2019; v1 submitted 14 July, 2019; originally announced July 2019.

    Comments: Camera ready version

  28. arXiv:1907.01011  [pdf, other

    cs.LG cs.CL stat.ML

    Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization

    Authors: Paul Pu Liang, Zhun Liu, Yao-Hung Hubert Tsai, Qibin Zhao, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: There has been an increased interest in multimodal language processing including multimodal dialog, question answering, sentiment analysis, and speech recognition. However, naturally occurring multimodal data is often imperfect as a result of imperfect modalities, missing entries or noise corruption. To address these concerns, we present a regularization method based on tensor rank minimization. O… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

  29. arXiv:1906.02125  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS stat.ML

    Strong and Simple Baselines for Multimodal Utterance Embeddings

    Authors: Paul Pu Liang, Yao Chong Lim, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Human language is a rich multimodal signal consisting of spoken words, facial expressions, body gestures, and vocal intonations. Learning representations for these spoken utterances is a complex research problem due to the presence of multiple heterogeneous sources of information. Recent advances in multimodal learning have followed the general trend of building more complex models that utilize va… ▽ More

    Submitted 28 February, 2020; v1 submitted 14 May, 2019; originally announced June 2019.

    Comments: NAACL 2019 oral presentation

  30. arXiv:1906.00295  [pdf, other

    cs.CL

    Multimodal Transformer for Unaligned Multimodal Language Sequences

    Authors: Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J. Zico Kolter, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this pa… ▽ More

    Submitted 1 June, 2019; originally announced June 2019.

  31. arXiv:1903.10547  [pdf, other

    cs.CV

    Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph

    Authors: Yao-Hung Hubert Tsai, Santosh Divvala, Louis-Philippe Morency, Ruslan Salakhutdinov, Ali Farhadi

    Abstract: Visual relationship reasoning is a crucial yet challenging task for understanding rich interactions across visual concepts. For example, a relationship 'man, open, door' involves a complex relation 'open' between concrete entities 'man, door'. While much of the existing work has studied this problem in the context of still images, understanding visual relationships in videos has received limited a… ▽ More

    Submitted 27 March, 2019; v1 submitted 25 March, 2019; originally announced March 2019.

    Comments: CVPR 2019. Supplementary included. Fixing a small typo

  32. arXiv:1806.06176  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    Learning Factorized Multimodal Representations

    Authors: Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: Learning multimodal representations is a fundamentally complex research problem due to the presence of multiple heterogeneous sources of information. Although the presence of multiple modalities provides additional valuable information, there are two key challenges to address when learning from multimodal data: 1) models must learn the complex intra-modal and cross-modal interactions for predictio… ▽ More

    Submitted 14 May, 2019; v1 submitted 15 June, 2018; originally announced June 2018.

    Comments: ICLR 2019

  33. arXiv:1802.05411  [pdf, ps, other

    cs.LG stat.ML

    Selecting the Best in GANs Family: a Post Selection Inference Framework

    Authors: Yao-Hung Hubert Tsai, Makoto Yamada, Denny Wu, Ruslan Salakhutdinov, Ichiro Takeuchi, Kenji Fukumizu

    Abstract: "Which Generative Adversarial Networks (GANs) generates the most plausible images?" has been a frequently asked question among researchers. To address this problem, we first propose an \emph{incomplete} U-statistics estimate of maximum mean discrepancy $\mathrm{MMD}_{inc}$ to measure the distribution discrepancy between generated and real images. $\mathrm{MMD}_{inc}$ enjoys the advantages of asymp… ▽ More

    Submitted 23 June, 2018; v1 submitted 15 February, 2018; originally announced February 2018.

  34. arXiv:1802.05408  [pdf, ps, other

    cs.IT cs.LG stat.ML

    "Dependency Bottleneck" in Auto-encoding Architectures: an Empirical Study

    Authors: Denny Wu, Yixiu Zhao, Yao-Hung Hubert Tsai, Makoto Yamada, Ruslan Salakhutdinov

    Abstract: Recent works investigated the generalization properties in deep neural networks (DNNs) by studying the Information Bottleneck in DNNs. However, the mea- surement of the mutual information (MI) is often inaccurate due to the density estimation. To address this issue, we propose to measure the dependency instead of MI between layers in DNNs. Specifically, we propose to use Hilbert-Schmidt Independen… ▽ More

    Submitted 15 February, 2018; originally announced February 2018.

  35. arXiv:1711.03167  [pdf, other

    cs.LG

    Learning Markov Chain in Unordered Dataset

    Authors: Yao-Hung Hubert Tsai, Han Zhao, Ruslan Salakhutdinov, Nebojsa Jojic

    Abstract: The assumption that data samples are independently identically distributed is the backbone of many learning algorithms. Nevertheless, datasets often exhibit rich structure in practice, and we argue that there exist some unknown order within the data instances. In this technical report, we introduce OrderNet that can be used to extract the order of data instances in an unsupervised way. By assuming… ▽ More

    Submitted 5 March, 2019; v1 submitted 8 November, 2017; originally announced November 2017.

    Comments: This would be the final update for this technical report on learning Markov Chain in the unordered dataset

  36. arXiv:1710.08347  [pdf, other

    cs.LG

    Improving One-Shot Learning through Fusing Side Information

    Authors: Yao-Hung Hubert Tsai, Ruslan Salakhutdinov

    Abstract: Deep Neural Networks (DNNs) often struggle with one-shot learning where we have only one or a few labeled training examples per category. In this paper, we argue that by using side information, we may compensate the missing information across classes. We introduce two statistical approaches for fusing side information into data representation learning to improve one-shot learning. First, we propos… ▽ More

    Submitted 22 January, 2018; v1 submitted 23 October, 2017; originally announced October 2017.

  37. arXiv:1706.02295  [pdf, other

    cs.LG

    Generative-Discriminative Variational Model for Visual Recognition

    Authors: Chih-Kuan Yeh, Yao-Hung Hubert Tsai, Yu-Chiang Frank Wang

    Abstract: The paradigm shift from shallow classifiers with hand-crafted features to end-to-end trainable deep learning models has shown significant improvements on supervised learning tasks. Despite the promising power of deep neural networks (DNN), how to alleviate overfitting during training has been a research topic of interest. In this paper, we present a Generative-Discriminative Variational Model (GDV… ▽ More

    Submitted 7 June, 2017; originally announced June 2017.

  38. arXiv:1703.05908  [pdf, other

    cs.CV cs.CL cs.LG

    Learning Robust Visual-Semantic Embeddings

    Authors: Yao-Hung Hubert Tsai, Liang-Kang Huang, Ruslan Salakhutdinov

    Abstract: Many of the existing methods for learning joint embedding of images and text use only supervised information from paired images and its textual attributes. Taking advantage of the recent success of unsupervised learning in deep neural networks, we propose an end-to-end learning framework that is able to extract more robust multi-modal representations across domains. The proposed method combines re… ▽ More

    Submitted 19 March, 2017; v1 submitted 17 March, 2017; originally announced March 2017.

    Comments: 12 pages