Search | arXiv e-print repository

Blockwise Principal Component Analysis for monotone missing data imputation and dimensionality reduction

Authors: Tu T. Do, Mai Anh Vu, Tuan L. Vo, Hoang Thien Ly, Thu Nguyen, Steven A. Hicks, Michael A. Riegler, Pål Halvorsen, Binh T. Nguyen

Abstract: Monotone missing data is a common problem in data analysis. However, imputation combined with dimensionality reduction can be computationally expensive, especially with the increasing size of datasets. To address this issue, we propose a Blockwise principal component analysis Imputation (BPI) framework for dimensionality reduction and imputation of monotone missing data. The framework conducts Pri… ▽ More Monotone missing data is a common problem in data analysis. However, imputation combined with dimensionality reduction can be computationally expensive, especially with the increasing size of datasets. To address this issue, we propose a Blockwise principal component analysis Imputation (BPI) framework for dimensionality reduction and imputation of monotone missing data. The framework conducts Principal Component Analysis (PCA) on the observed part of each monotone block of the data and then imputes on merging the obtained principal components using a chosen imputation technique. BPI can work with various imputation techniques and can significantly reduce imputation time compared to conducting dimensionality reduction after imputation. This makes it a practical and efficient approach for large datasets with monotone missing data. Our experiments validate the improvement in speed. In addition, our experiments also show that while applying MICE imputation directly on missing data may not yield convergence, applying BPI with MICE for the data may lead to convergence. △ Less

Submitted 10 January, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

arXiv:2205.15150 [pdf, ps, other]

Principal Component Analysis based frameworks for efficient missing data imputation algorithms

Authors: Thu Nguyen, Hoang Thien Ly, Michael Alexander Riegler, Pål Halvorsen, Hugo L. Hammer

Abstract: Missing data is a commonly occurring problem in practice. Many imputation methods have been developed to fill in the missing entries. However, not all of them can scale to high-dimensional data, especially the multiple imputation techniques. Meanwhile, the data nowadays tends toward high-dimensional. Therefore, in this work, we propose Principal Component Analysis Imputation (PCAI), a simple but v… ▽ More Missing data is a commonly occurring problem in practice. Many imputation methods have been developed to fill in the missing entries. However, not all of them can scale to high-dimensional data, especially the multiple imputation techniques. Meanwhile, the data nowadays tends toward high-dimensional. Therefore, in this work, we propose Principal Component Analysis Imputation (PCAI), a simple but versatile framework based on Principal Component Analysis (PCA) to speed up the imputation process and alleviate memory issues of many available imputation techniques, without sacrificing the imputation quality in term of MSE. In addition, the frameworks can be used even when some or all of the missing features are categorical, or when the number of missing features is large. Next, we introduce PCA Imputation - Classification (PIC), an application of PCAI for classification problems with some adjustments. We validate our approach by experiments on various scenarios, which shows that PCAI and PIC can work with various imputation algorithms, including the state-of-the-art ones and improve the imputation speed significantly, while achieving competitive mean square error/classification accuracy compared to direct imputation (i.e., impute directly on the missing data). △ Less

Submitted 19 March, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

arXiv:2202.05057 [pdf, other]

A VM/Containerized Approach for Scaling TinyML Applications

Authors: Meelis Lootus, Kartik Thakore, Sam Leroux, Geert Trooskens, Akshay Sharma, Holly Ly

Abstract: Although deep neural networks are typically computationally expensive to use, technological advances in both the design of hardware platforms and of neural network architectures, have made it possible to use powerful models on edge devices. To enable widespread adoption of edge based machine learning, we introduce a set of open-source tools that make it easy to deploy, update and monitor machine l… ▽ More Although deep neural networks are typically computationally expensive to use, technological advances in both the design of hardware platforms and of neural network architectures, have made it possible to use powerful models on edge devices. To enable widespread adoption of edge based machine learning, we introduce a set of open-source tools that make it easy to deploy, update and monitor machine learning models on a wide variety of edge devices. Our tools bring the concept of containerization to the TinyML world. We propose to package ML and application logic as containers called Runes to deploy onto edge devices. The containerization allows us to target a fragmented Internet-of-Things (IoT) ecosystem by providing a common platform for Runes to run across devices. △ Less

Submitted 10 February, 2022; originally announced February 2022.

Comments: Presented at the tinyML 2021 Research Symposium

arXiv:1909.12695 [pdf, other]

Joint Optimization of Execution Latency and Energy Consumption for Mobile Edge Computing with Data Compression and Task Allocation

Authors: Minh Hoang Ly, Thinh Quang Dinh, Ha Hoang Kha

Abstract: In this paper, we consider the mobile edge offloading scenario consisting of one mobile device (MD) with multiple independent tasks and various remote edge devices. In order to save energy, the user's device can offload the tasks to available access points for edge computing. Data compression is applied to reduce offloaded data size prior to wireless transmission to minimize the execution latency.… ▽ More In this paper, we consider the mobile edge offloading scenario consisting of one mobile device (MD) with multiple independent tasks and various remote edge devices. In order to save energy, the user's device can offload the tasks to available access points for edge computing. Data compression is applied to reduce offloaded data size prior to wireless transmission to minimize the execution latency. The problem of jointly optimizing the task allocation decision and the data compression ratio to minimize the total tasks' execution latency and the MD's energy consumption concurrently is proposed. We show that the design problem is a non-convex optimization one but it can be transformed into a convex one through a semidefinite relaxation (SDR) based approach. Numerical simulations demonstrate the outperformance of the proposed scheme compared to the benchmark one. △ Less

Submitted 10 October, 2019; v1 submitted 27 September, 2019; originally announced September 2019.

Comments: ISEE 2019

arXiv:1201.1935 [pdf, ps, other]

doi 10.1109/TIT.2013.2245395

Secure Symmetrical Multilevel Diversity Coding

Authors: Anantharaman Balasubramanian, Hung D. Ly, Shuo Li, Tie Liu, Scott L. Miller

Abstract: Symmetrical Multilevel Diversity Coding (SMDC) is a network compression problem introduced by Roche (1992) and Yeung (1995). In this setting, a simple separate coding strategy known as superposition coding was shown to be optimal in terms of achieving the minimum sum rate (Roche, Yeung, and Hau, 1997) and the entire admissible rate region (Yeung and Zhang, 1999) of the problem. This paper consider… ▽ More Symmetrical Multilevel Diversity Coding (SMDC) is a network compression problem introduced by Roche (1992) and Yeung (1995). In this setting, a simple separate coding strategy known as superposition coding was shown to be optimal in terms of achieving the minimum sum rate (Roche, Yeung, and Hau, 1997) and the entire admissible rate region (Yeung and Zhang, 1999) of the problem. This paper considers a natural generalization of SMDC to the secure communication setting with an additional eavesdropper. It is required that all sources need to be kept perfectly secret from the eavesdropper as long as the number of encoder outputs available at the eavesdropper is no more than a given threshold. First, the problem of encoding individual sources is studied. A precise characterization of the entire admissible rate region is established via a connection to the problem of secure coding over a three-layer wiretap network and utilizing some basic polyhedral structure of the admissible rate region. Building on this result, it is then shown that superposition coding remains optimal in terms of achieving the minimum sum rate for the general secure SMDC problem. △ Less

Submitted 9 January, 2012; originally announced January 2012.

Comments: Submitted to the IEEE Transactions on Information Theory in May 2011. Minor revision made to the current version in January 2012

arXiv:1102.1475 [pdf, ps, other]

Security Embedding Codes

Authors: Hung D. Ly, Tie Liu, Yufei Blankenship

Abstract: This paper considers the problem of simultaneously communicating two messages, a high-security message and a low-security message, to a legitimate receiver, referred to as the security embedding problem. An information-theoretic formulation of the problem is presented. A coding scheme that combines rate splitting, superposition coding, nested binning and channel prefixing is considered and is show… ▽ More This paper considers the problem of simultaneously communicating two messages, a high-security message and a low-security message, to a legitimate receiver, referred to as the security embedding problem. An information-theoretic formulation of the problem is presented. A coding scheme that combines rate splitting, superposition coding, nested binning and channel prefixing is considered and is shown to achieve the secrecy capacity region of the channel in several scenarios. Specifying these results to both scalar and independent parallel Gaussian channels (under an average individual per-subchannel power constraint), it is shown that the high-security message can be embedded into the low-security message at full rate (as if the low-security message does not exist) without incurring any loss on the overall rate of communication (as if both messages are low-security messages). Extensions to the wiretap channel II setting of Ozarow and Wyner are also considered, where it is shown that "perfect" security embedding can be achieved by an encoder that uses a two-level coset code. △ Less

Submitted 7 February, 2011; originally announced February 2011.

Comments: Submitted to the IEEE Transactions on Information Forensics and Security

arXiv:0907.2599 [pdf, ps, other]

doi 10.1109/TIT.2010.2069190

Multiple-Input Multiple-Output Gaussian Broadcast Channels with Common and Confidential Messages

Authors: Hung D. Ly, Tie Liu, Yingbin Liang

Abstract: This paper considers the problem of the multiple-input multiple-output (MIMO) Gaussian broadcast channel with two receivers (receivers 1 and 2) and two messages: a common message intended for both receivers and a confidential message intended only for receiver 1 but needing to be kept asymptotically perfectly secure from receiver 2. A matrix characterization of the secrecy capacity region is est… ▽ More This paper considers the problem of the multiple-input multiple-output (MIMO) Gaussian broadcast channel with two receivers (receivers 1 and 2) and two messages: a common message intended for both receivers and a confidential message intended only for receiver 1 but needing to be kept asymptotically perfectly secure from receiver 2. A matrix characterization of the secrecy capacity region is established via a channel enhancement argument. The enhanced channel is constructed by first splitting receiver 1 into two virtual receivers and then enhancing only the virtual receiver that decodes the confidential message. The secrecy capacity region of the enhanced channel is characterized using an extremal entropy inequality previously established for characterizing the capacity region of a degraded compound MIMO Gaussian broadcast channel. △ Less

Submitted 15 July, 2009; originally announced July 2009.

Comments: Submitted to the IEEE Transactions on Information Theory, July 2009

Showing 1–7 of 7 results for author: Ly, H