Search | arXiv e-print repository

Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation

Authors: Liang Luo, Buyun Zhang, Michael Tsang, Yinbin Ma, Ching-Hsiang Chu, Yuxin Chen, Shen Li, Yuchen Hao, Yanli Zhao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Dheevatsa Mudigere, Maxim Naumov

Abstract: We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global… ▽ More We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global embedding lookup process into disjoint towers to exploit data center locality; (2) Tower Module (TM), a synergistic dense component attached to each tower to reduce model complexity and communication volume through hierarchical feature interaction; and (3) Tower Partitioner (TP), a feature partitioner to systematically create towers with meaningful feature interactions and load balanced assignments to preserve model quality and training throughput via learned embeddings. We show that DMT can achieve up to 1.9x speedup compared to the state-of-the-art baselines without losing accuracy across multiple generations of hardware at large data center scales. △ Less

Submitted 2 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

arXiv:2308.13217 [pdf, other]

GEMTrans: A General, Echocardiography-based, Multi-Level Transformer Framework for Cardiovascular Diagnosis

Authors: Masoud Mokhtari, Neda Ahmadi, Teresa S. M. Tsang, Purang Abolmaesumi, Renjie Liao

Abstract: Echocardiography (echo) is an ultrasound imaging modality that is widely used for various cardiovascular diagnosis tasks. Due to inter-observer variability in echo-based diagnosis, which arises from the variability in echo image acquisition and the interpretation of echo images based on clinical experience, vision-based machine learning (ML) methods have gained popularity to act as secondary layer… ▽ More Echocardiography (echo) is an ultrasound imaging modality that is widely used for various cardiovascular diagnosis tasks. Due to inter-observer variability in echo-based diagnosis, which arises from the variability in echo image acquisition and the interpretation of echo images based on clinical experience, vision-based machine learning (ML) methods have gained popularity to act as secondary layers of verification. For such safety-critical applications, it is essential for any proposed ML method to present a level of explainability along with good accuracy. In addition, such methods must be able to process several echo videos obtained from various heart views and the interactions among them to properly produce predictions for a variety of cardiovascular measurements or interpretation tasks. Prior work lacks explainability or is limited in scope by focusing on a single cardiovascular task. To remedy this, we propose a General, Echo-based, Multi-Level Transformer (GEMTrans) framework that provides explainability, while simultaneously enabling multi-video training where the inter-play among echo image patches in the same frame, all frames in the same video, and inter-video relationships are captured based on a downstream task. We show the flexibility of our framework by considering two critical tasks including ejection fraction (EF) and aortic stenosis (AS) severity detection. Our model achieves mean absolute errors of 4.15 and 4.84 for single and dual-video EF estimation and an accuracy of 96.5 % for AS detection, while providing informative task-specific attention maps and prototypical explainability. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: To be published in MLMI 2023

arXiv:2307.14433 [pdf, other]

ProtoASNet: Dynamic Prototypes for Inherently Interpretable and Uncertainty-Aware Aortic Stenosis Classification in Echocardiography

Authors: Hooman Vaseli, Ang Nan Gu, S. Neda Ahmadi Amiri, Michael Y. Tsang, Andrea Fung, Nima Kondori, Armin Saadat, Purang Abolmaesumi, Teresa S. M. Tsang

Abstract: Aortic stenosis (AS) is a common heart valve disease that requires accurate and timely diagnosis for appropriate treatment. Most current automatic AS severity detection methods rely on black-box models with a low level of trustworthiness, which hinders clinical adoption. To address this issue, we propose ProtoASNet, a prototypical network that directly detects AS from B-mode echocardiography video… ▽ More Aortic stenosis (AS) is a common heart valve disease that requires accurate and timely diagnosis for appropriate treatment. Most current automatic AS severity detection methods rely on black-box models with a low level of trustworthiness, which hinders clinical adoption. To address this issue, we propose ProtoASNet, a prototypical network that directly detects AS from B-mode echocardiography videos, while making interpretable predictions based on the similarity between the input and learned spatio-temporal prototypes. This approach provides supporting evidence that is clinically relevant, as the prototypes typically highlight markers such as calcification and restricted movement of aortic valve leaflets. Moreover, ProtoASNet utilizes abstention loss to estimate aleatoric uncertainty by defining a set of prototypes that capture ambiguity and insufficient information in the observed data. This provides a reliable system that can detect and explain when it may fail. We evaluate ProtoASNet on a private dataset and the publicly available TMED-2 dataset, where it outperforms existing state-of-the-art methods with an accuracy of 80.0% and 79.7%, respectively. Furthermore, ProtoASNet provides interpretability and an uncertainty measure for each prediction, which can improve transparency and facilitate the interactive usage of deep networks to aid clinical decision-making. Our source code is available at: https://github.com/hooman007/ProtoASNet. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: To be published in MICCAI 2023

arXiv:2307.12229 [pdf, other]

EchoGLAD: Hierarchical Graph Neural Networks for Left Ventricle Landmark Detection on Echocardiograms

Authors: Masoud Mokhtari, Mobina Mahdavi, Hooman Vaseli, Christina Luong, Purang Abolmaesumi, Teresa S. M. Tsang, Renjie Liao

Abstract: The functional assessment of the left ventricle chamber of the heart requires detecting four landmark locations and measuring the internal dimension of the left ventricle and the approximate mass of the surrounding muscle. The key challenge of automating this task with machine learning is the sparsity of clinical labels, i.e., only a few landmark pixels in a high-dimensional image are annotated, l… ▽ More The functional assessment of the left ventricle chamber of the heart requires detecting four landmark locations and measuring the internal dimension of the left ventricle and the approximate mass of the surrounding muscle. The key challenge of automating this task with machine learning is the sparsity of clinical labels, i.e., only a few landmark pixels in a high-dimensional image are annotated, leading many prior works to heavily rely on isotropic label smoothing. However, such a label smoothing strategy ignores the anatomical information of the image and induces some bias. To address this challenge, we introduce an echocardiogram-based, hierarchical graph neural network (GNN) for left ventricle landmark detection (EchoGLAD). Our main contributions are: 1) a hierarchical graph representation learning framework for multi-resolution landmark detection via GNNs; 2) induced hierarchical supervision at different levels of granularity using a multi-level loss. We evaluate our model on a public and a private dataset under the in-distribution (ID) and out-of-distribution (OOD) settings. For the ID setting, we achieve the state-of-the-art mean absolute errors (MAEs) of 1.46 mm and 1.86 mm on the two datasets. Our model also shows better OOD generalization than prior works with a testing MAE of 4.3 mm. △ Less

Submitted 23 July, 2023; originally announced July 2023.

Comments: To be published in MICCAI 2023

arXiv:2203.11014 [pdf, other]

DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction

Authors: Buyun Zhang, Liang Luo, Xi Liu, Jay Li, Zeliang Chen, Weilin Zhang, Xiaohan Wei, Yuchen Hao, Michael Tsang, Wenjun Wang, Yang Liu, Huayu Li, Yasmine Badr, Jongsoo Park, Jiyan Yang, Dheevatsa Mudigere, Ellie Wen

Abstract: Learning feature interactions is important to the model performance of online advertising services. As a result, extensive efforts have been devoted to designing effective architectures to learn feature interactions. However, we observe that the practical performance of those designs can vary from dataset to dataset, even when the order of interactions claimed to be captured is the same. That indi… ▽ More Learning feature interactions is important to the model performance of online advertising services. As a result, extensive efforts have been devoted to designing effective architectures to learn feature interactions. However, we observe that the practical performance of those designs can vary from dataset to dataset, even when the order of interactions claimed to be captured is the same. That indicates different designs may have different advantages and the interactions captured by them have non-overlap** information. Motivated by this observation, we propose DHEN - a deep and hierarchical ensemble architecture that can leverage strengths of heterogeneous interaction modules and learn a hierarchy of the interactions under different orders. To overcome the challenge brought by DHEN's deeper and multi-layer structure in training, we propose a novel co-designed training system that can further improve the training efficiency of DHEN. Experiments of DHEN on large-scale dataset from CTR prediction tasks attained 0.27\% improvement on the Normalized Entropy (NE) of prediction and 1.2x better training throughput than state-of-the-art baseline, demonstrating their effectiveness in practice. △ Less

Submitted 11 March, 2022; originally announced March 2022.

arXiv:2103.03103 [pdf, ps, other]

Interpretable Artificial Intelligence through the Lens of Feature Interaction

Authors: Michael Tsang, James Enouen, Yan Liu

Abstract: Interpretation of deep learning models is a very challenging problem because of their large number of parameters, complex connections between nodes, and unintelligible feature representations. Despite this, many view interpretability as a key solution to trustworthiness, fairness, and safety, especially as deep learning is applied to more critical decision tasks like credit approval, job screening… ▽ More Interpretation of deep learning models is a very challenging problem because of their large number of parameters, complex connections between nodes, and unintelligible feature representations. Despite this, many view interpretability as a key solution to trustworthiness, fairness, and safety, especially as deep learning is applied to more critical decision tasks like credit approval, job screening, and recidivism prediction. There is an abundance of good research providing interpretability to deep learning models; however, many of the commonly used methods do not consider a phenomenon called "feature interaction." This work first explains the historical and modern importance of feature interactions and then surveys the modern interpretability methods which do explicitly consider feature interactions. This survey aims to bring to light the importance of feature interactions in the larger context of machine learning interpretability, especially in a modern context where deep learning models heavily rely on feature interactions. △ Less

Submitted 1 March, 2021; originally announced March 2021.

arXiv:2102.01586 [pdf, other]

U-LanD: Uncertainty-Driven Video Landmark Detection

Authors: Mohammad H. Jafari, Christina Luong, Michael Tsang, Ang Nan Gu, Nathan Van Woudenberg, Robert Rohling, Teresa Tsang, Purang Abolmaesumi

Abstract: This paper presents U-LanD, a framework for joint detection of key frames and landmarks in videos. We tackle a specifically challenging problem, where training labels are noisy and highly sparse. U-LanD builds upon a pivotal observation: a deep Bayesian landmark detector solely trained on key video frames, has significantly lower predictive uncertainty on those frames vs. other frames in videos. W… ▽ More This paper presents U-LanD, a framework for joint detection of key frames and landmarks in videos. We tackle a specifically challenging problem, where training labels are noisy and highly sparse. U-LanD builds upon a pivotal observation: a deep Bayesian landmark detector solely trained on key video frames, has significantly lower predictive uncertainty on those frames vs. other frames in videos. We use this observation as an unsupervised signal to automatically recognize key frames on which we detect landmarks. As a test-bed for our framework, we use ultrasound imaging videos of the heart, where sparse and noisy clinical labels are only available for a single frame in each video. Using data from 4,493 patients, we demonstrate that U-LanD can exceedingly outperform the state-of-the-art non-Bayesian counterpart by a noticeable absolute margin of 42% in R2 score, with almost no overhead imposed on the model size. Our approach is generic and can be potentially applied to other challenging data with noisy and sparse training labels. △ Less

Submitted 2 February, 2021; originally announced February 2021.

arXiv:2006.15473 [pdf, other]

Interpretable and Trustworthy Deepfake Detection via Dynamic Prototypes

Authors: Loc Trinh, Michael Tsang, Sirisha Rambhatla, Yan Liu

Abstract: In this paper we propose a novel human-centered approach for detecting forgery in face images, using dynamic prototypes as a form of visual explanations. Currently, most state-of-the-art deepfake detections are based on black-box models that process videos frame-by-frame for inference, and few closely examine their temporal inconsistencies. However, the existence of such temporal artifacts within… ▽ More In this paper we propose a novel human-centered approach for detecting forgery in face images, using dynamic prototypes as a form of visual explanations. Currently, most state-of-the-art deepfake detections are based on black-box models that process videos frame-by-frame for inference, and few closely examine their temporal inconsistencies. However, the existence of such temporal artifacts within deepfake videos is key in detecting and explaining deepfakes to a supervising human. To this end, we propose Dynamic Prototype Network (DPNet) -- an interpretable and effective solution that utilizes dynamic representations (i.e., prototypes) to explain deepfake temporal artifacts. Extensive experimental results show that DPNet achieves competitive predictive performance, even on unseen testing datasets such as Google's DeepFakeDetection, DeeperForensics, and Celeb-DF, while providing easy referential explanations of deepfake dynamics. On top of DPNet's prototypical framework, we further formulate temporal logic specifications based on these dynamics to check our model's compliance to desired temporal behaviors, hence providing trustworthiness for such critical detection systems. △ Less

Submitted 14 January, 2021; v1 submitted 27 June, 2020; originally announced June 2020.

Comments: To appear in the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV 21')

arXiv:2006.10966 [pdf, other]

Feature Interaction Interpretability: A Case for Explaining Ad-Recommendation Systems via Neural Interaction Detection

Authors: Michael Tsang, Dehua Cheng, Hanpeng Liu, Xue Feng, Eric Zhou, Yan Liu

Abstract: Recommendation is a prevalent application of machine learning that affects many users; therefore, it is important for recommender models to be accurate and interpretable. In this work, we propose a method to both interpret and augment the predictions of black-box recommender systems. In particular, we propose to interpret feature interactions from a source recommender model and explicitly encode t… ▽ More Recommendation is a prevalent application of machine learning that affects many users; therefore, it is important for recommender models to be accurate and interpretable. In this work, we propose a method to both interpret and augment the predictions of black-box recommender systems. In particular, we propose to interpret feature interactions from a source recommender model and explicitly encode these interactions in a target recommender model, where both source and target models are black-boxes. By not assuming the structure of the recommender system, our approach can be used in general settings. In our experiments, we focus on a prominent use of machine learning recommendation: ad-click prediction. We found that our interaction interpretations are both informative and predictive, e.g., significantly outperforming existing recommender models. What's more, the same approach to interpret interactions can provide new insights into domains even beyond recommendation, such as text and image classification. △ Less

Submitted 19 June, 2020; originally announced June 2020.

Comments: Published in ICLR 2020

arXiv:2006.10965 [pdf, other]

How does this interaction affect me? Interpretable attribution for feature interactions

Authors: Michael Tsang, Sirisha Rambhatla, Yan Liu

Abstract: Machine learning transparency calls for interpretable explanations of how inputs relate to predictions. Feature attribution is a way to analyze the impact of features on predictions. Feature interactions are the contextual dependence between features that jointly impact predictions. There are a number of methods that extract feature interactions in prediction models; however, the methods that assi… ▽ More Machine learning transparency calls for interpretable explanations of how inputs relate to predictions. Feature attribution is a way to analyze the impact of features on predictions. Feature interactions are the contextual dependence between features that jointly impact predictions. There are a number of methods that extract feature interactions in prediction models; however, the methods that assign attributions to interactions are either uninterpretable, model-specific, or non-axiomatic. We propose an interaction attribution and detection framework called Archipelago which addresses these problems and is also scalable in real-world settings. Our experiments on standard annotation labels indicate our approach provides significantly more interpretable explanations than comparable methods, which is important for analyzing the impact of interactions on predictions. We also provide accompanying visualizations of our approach that give new insights into deep neural networks. △ Less

Submitted 19 June, 2020; originally announced June 2020.

arXiv:1906.04664 [pdf, other]

Extracting Interpretable Concept-Based Decision Trees from CNNs

Authors: Conner Chyung, Michael Tsang, Yan Liu

Abstract: In an attempt to gather a deeper understanding of how convolutional neural networks (CNNs) reason about human-understandable concepts, we present a method to infer labeled concept data from hidden layer activations and interpret the concepts through a shallow decision tree. The decision tree can provide information about which concepts a model deems important, as well as provide an understanding o… ▽ More In an attempt to gather a deeper understanding of how convolutional neural networks (CNNs) reason about human-understandable concepts, we present a method to infer labeled concept data from hidden layer activations and interpret the concepts through a shallow decision tree. The decision tree can provide information about which concepts a model deems important, as well as provide an understanding of how the concepts interact with each other. Experiments demonstrate that the extracted decision tree is capable of accurately representing the original CNN's classifications at low tree depths, thus encouraging human-in-the-loop understanding of discriminative concepts. △ Less

Submitted 16 June, 2019; v1 submitted 11 June, 2019; originally announced June 2019.

Comments: presented at 2019 ICML Workshop on Human in the Loop Learning (HILL 2019), Long Beach, USA

arXiv:1812.04801 [pdf, other]

Can I trust you more? Model-Agnostic Hierarchical Explanations

Authors: Michael Tsang, Youbang Sun, Dongxu Ren, Yan Liu

Abstract: Interactions such as double negation in sentences and scene interactions in images are common forms of complex dependencies captured by state-of-the-art machine learning models. We propose Mahé, a novel approach to provide Model-agnostic hierarchical éxplanations of how powerful machine learning models, such as deep neural networks, capture these interactions as either dependent on or free of the… ▽ More Interactions such as double negation in sentences and scene interactions in images are common forms of complex dependencies captured by state-of-the-art machine learning models. We propose Mahé, a novel approach to provide Model-agnostic hierarchical éxplanations of how powerful machine learning models, such as deep neural networks, capture these interactions as either dependent on or free of the context of data instances. Specifically, Mahé provides context-dependent explanations by a novel local interpretation algorithm that effectively captures any-order interactions, and obtains context-free explanations through generalizing context-dependent interactions to explain global behaviors. Experimental results show that Mahé obtains improved local interaction interpretations over state-of-the-art methods and successfully explains interactions that are context-free. △ Less

Submitted 11 December, 2018; originally announced December 2018.

arXiv:1705.04977 [pdf, other]

Detecting Statistical Interactions from Neural Network Weights

Authors: Michael Tsang, Dehua Cheng, Yan Liu

Abstract: Interpreting neural networks is a crucial and challenging task in machine learning. In this paper, we develop a novel framework for detecting statistical interactions captured by a feedforward multilayer neural network by directly interpreting its learned weights. Depending on the desired interactions, our method can achieve significantly better or similar interaction detection performance compare… ▽ More Interpreting neural networks is a crucial and challenging task in machine learning. In this paper, we develop a novel framework for detecting statistical interactions captured by a feedforward multilayer neural network by directly interpreting its learned weights. Depending on the desired interactions, our method can achieve significantly better or similar interaction detection performance compared to the state-of-the-art without searching an exponential solution space of possible interactions. We obtain this accuracy and efficiency by observing that interactions between input features are created by the non-additive effect of nonlinear activation functions, and that interacting paths are encoded in weight matrices. We demonstrate the performance of our method and the importance of discovered interactions via experimental results on both synthetic datasets and real-world application datasets. △ Less

Submitted 27 February, 2018; v1 submitted 14 May, 2017; originally announced May 2017.

Comments: Published in ICLR 2018

arXiv:1310.0291 [pdf, ps, other]

doi 10.1109/ISIT.2014.6874847

Mismatched Quantum Filtering and Entropic Information

Authors: Mankei Tsang

Abstract: Quantum filtering is a signal processing technique that estimates the posterior state of a quantum system under continuous measurements and has become a standard tool in quantum information processing, with applications in quantum state preparation, quantum metrology, and quantum control. If the filter assumes a nominal model that differs from reality, however, the estimation accuracy is bound to… ▽ More Quantum filtering is a signal processing technique that estimates the posterior state of a quantum system under continuous measurements and has become a standard tool in quantum information processing, with applications in quantum state preparation, quantum metrology, and quantum control. If the filter assumes a nominal model that differs from reality, however, the estimation accuracy is bound to suffer. Here I derive identities that relate the excess error caused by quantum filter mismatch to the relative entropy between the true and nominal observation probability measures, with one identity for Gaussian measurements, such as optical homodyne detection, and another for Poissonian measurements, such as photon counting. These identities generalize recent seminal results in classical information theory and provide new operational meanings to relative entropy, mutual information, and channel capacity in the context of quantum experiments. △ Less

Submitted 27 January, 2014; v1 submitted 1 October, 2013; originally announced October 2013.

Comments: v1: first draft, 8 pages, v2: added introduction and more results on mutual information and channel capacity, 12 pages, v3: minor updates, v4: updated the presentation

Journal ref: Proceedings of IEEE International Symposium on Information Theory, Honolulu, Hawaii, USA, June 29-July 4 2014, pp. 321-325

arXiv:1104.1007 [pdf, ps, other]

Coding the Beams: Improving Beamforming Training in mmWave Communication System

Authors: Y. Ming Tsang, Ada S. Y. Poon, Sateesh Addepalli

Abstract: The mmWave communication system is operating at a regime with high number of antennas and very limited number of RF analog chains. Large number of antennas are used to extend the communication range for recovering the high path loss while fewer RF analog chains are designed to reduce transmit and processing power and hardware complexity. In this regime, typical MIMO algorithms are not applicable.… ▽ More The mmWave communication system is operating at a regime with high number of antennas and very limited number of RF analog chains. Large number of antennas are used to extend the communication range for recovering the high path loss while fewer RF analog chains are designed to reduce transmit and processing power and hardware complexity. In this regime, typical MIMO algorithms are not applicable. Before any communication starts, devices are needed to align their beam pointing angles towards each other. An efficient searching protocol to obtain the best beam angle pair is therefore needed. It is called BeamForming (BF) training protocol. This paper presents a new BF training technique called beam coding. Each beam angle is assigned unique signature code. By coding multiple beam angles and steering at their angles simultaneously in a training packet, the best beam angle pair can be obtained in a few packets. The proposed BF training technique not only shows the robustness in non-line-of-sight environment, but also provides very flat power variations within a packet in contrast to the IEEE 802.11ad standard whose scheme may lead to large dynamic range of signals due to beam angles varying across a training packet. △ Less

Submitted 1 August, 2012; v1 submitted 6 April, 2011; originally announced April 2011.

Comments: 6 pages, 10 figures, in GLOBECOM 2011. (Figure 8 and 9 are updated)

MSC Class: 94A05

Showing 1–15 of 15 results for author: Tsang, M