-
Blockwise Principal Component Analysis for monotone missing data imputation and dimensionality reduction
Authors:
Tu T. Do,
Mai Anh Vu,
Tuan L. Vo,
Hoang Thien Ly,
Thu Nguyen,
Steven A. Hicks,
Michael A. Riegler,
Pål Halvorsen,
Binh T. Nguyen
Abstract:
Monotone missing data is a common problem in data analysis. However, imputation combined with dimensionality reduction can be computationally expensive, especially with the increasing size of datasets. To address this issue, we propose a Blockwise principal component analysis Imputation (BPI) framework for dimensionality reduction and imputation of monotone missing data. The framework conducts Pri…
▽ More
Monotone missing data is a common problem in data analysis. However, imputation combined with dimensionality reduction can be computationally expensive, especially with the increasing size of datasets. To address this issue, we propose a Blockwise principal component analysis Imputation (BPI) framework for dimensionality reduction and imputation of monotone missing data. The framework conducts Principal Component Analysis (PCA) on the observed part of each monotone block of the data and then imputes on merging the obtained principal components using a chosen imputation technique. BPI can work with various imputation techniques and can significantly reduce imputation time compared to conducting dimensionality reduction after imputation. This makes it a practical and efficient approach for large datasets with monotone missing data. Our experiments validate the improvement in speed. In addition, our experiments also show that while applying MICE imputation directly on missing data may not yield convergence, applying BPI with MICE for the data may lead to convergence.
△ Less
Submitted 10 January, 2024; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Principal Component Analysis based frameworks for efficient missing data imputation algorithms
Authors:
Thu Nguyen,
Hoang Thien Ly,
Michael Alexander Riegler,
Pål Halvorsen,
Hugo L. Hammer
Abstract:
Missing data is a commonly occurring problem in practice. Many imputation methods have been developed to fill in the missing entries. However, not all of them can scale to high-dimensional data, especially the multiple imputation techniques. Meanwhile, the data nowadays tends toward high-dimensional. Therefore, in this work, we propose Principal Component Analysis Imputation (PCAI), a simple but v…
▽ More
Missing data is a commonly occurring problem in practice. Many imputation methods have been developed to fill in the missing entries. However, not all of them can scale to high-dimensional data, especially the multiple imputation techniques. Meanwhile, the data nowadays tends toward high-dimensional. Therefore, in this work, we propose Principal Component Analysis Imputation (PCAI), a simple but versatile framework based on Principal Component Analysis (PCA) to speed up the imputation process and alleviate memory issues of many available imputation techniques, without sacrificing the imputation quality in term of MSE. In addition, the frameworks can be used even when some or all of the missing features are categorical, or when the number of missing features is large. Next, we introduce PCA Imputation - Classification (PIC), an application of PCAI for classification problems with some adjustments. We validate our approach by experiments on various scenarios, which shows that PCAI and PIC can work with various imputation algorithms, including the state-of-the-art ones and improve the imputation speed significantly, while achieving competitive mean square error/classification accuracy compared to direct imputation (i.e., impute directly on the missing data).
△ Less
Submitted 19 March, 2023; v1 submitted 30 May, 2022;
originally announced May 2022.
-
A VM/Containerized Approach for Scaling TinyML Applications
Authors:
Meelis Lootus,
Kartik Thakore,
Sam Leroux,
Geert Trooskens,
Akshay Sharma,
Holly Ly
Abstract:
Although deep neural networks are typically computationally expensive to use, technological advances in both the design of hardware platforms and of neural network architectures, have made it possible to use powerful models on edge devices. To enable widespread adoption of edge based machine learning, we introduce a set of open-source tools that make it easy to deploy, update and monitor machine l…
▽ More
Although deep neural networks are typically computationally expensive to use, technological advances in both the design of hardware platforms and of neural network architectures, have made it possible to use powerful models on edge devices. To enable widespread adoption of edge based machine learning, we introduce a set of open-source tools that make it easy to deploy, update and monitor machine learning models on a wide variety of edge devices. Our tools bring the concept of containerization to the TinyML world. We propose to package ML and application logic as containers called Runes to deploy onto edge devices. The containerization allows us to target a fragmented Internet-of-Things (IoT) ecosystem by providing a common platform for Runes to run across devices.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Joint Optimization of Execution Latency and Energy Consumption for Mobile Edge Computing with Data Compression and Task Allocation
Authors:
Minh Hoang Ly,
Thinh Quang Dinh,
Ha Hoang Kha
Abstract:
In this paper, we consider the mobile edge offloading scenario consisting of one mobile device (MD) with multiple independent tasks and various remote edge devices. In order to save energy, the user's device can offload the tasks to available access points for edge computing. Data compression is applied to reduce offloaded data size prior to wireless transmission to minimize the execution latency.…
▽ More
In this paper, we consider the mobile edge offloading scenario consisting of one mobile device (MD) with multiple independent tasks and various remote edge devices. In order to save energy, the user's device can offload the tasks to available access points for edge computing. Data compression is applied to reduce offloaded data size prior to wireless transmission to minimize the execution latency. The problem of jointly optimizing the task allocation decision and the data compression ratio to minimize the total tasks' execution latency and the MD's energy consumption concurrently is proposed. We show that the design problem is a non-convex optimization one but it can be transformed into a convex one through a semidefinite relaxation (SDR) based approach. Numerical simulations demonstrate the outperformance of the proposed scheme compared to the benchmark one.
△ Less
Submitted 10 October, 2019; v1 submitted 27 September, 2019;
originally announced September 2019.
-
Secure Symmetrical Multilevel Diversity Coding
Authors:
Anantharaman Balasubramanian,
Hung D. Ly,
Shuo Li,
Tie Liu,
Scott L. Miller
Abstract:
Symmetrical Multilevel Diversity Coding (SMDC) is a network compression problem introduced by Roche (1992) and Yeung (1995). In this setting, a simple separate coding strategy known as superposition coding was shown to be optimal in terms of achieving the minimum sum rate (Roche, Yeung, and Hau, 1997) and the entire admissible rate region (Yeung and Zhang, 1999) of the problem. This paper consider…
▽ More
Symmetrical Multilevel Diversity Coding (SMDC) is a network compression problem introduced by Roche (1992) and Yeung (1995). In this setting, a simple separate coding strategy known as superposition coding was shown to be optimal in terms of achieving the minimum sum rate (Roche, Yeung, and Hau, 1997) and the entire admissible rate region (Yeung and Zhang, 1999) of the problem. This paper considers a natural generalization of SMDC to the secure communication setting with an additional eavesdropper. It is required that all sources need to be kept perfectly secret from the eavesdropper as long as the number of encoder outputs available at the eavesdropper is no more than a given threshold. First, the problem of encoding individual sources is studied. A precise characterization of the entire admissible rate region is established via a connection to the problem of secure coding over a three-layer wiretap network and utilizing some basic polyhedral structure of the admissible rate region. Building on this result, it is then shown that superposition coding remains optimal in terms of achieving the minimum sum rate for the general secure SMDC problem.
△ Less
Submitted 9 January, 2012;
originally announced January 2012.
-
Security Embedding Codes
Authors:
Hung D. Ly,
Tie Liu,
Yufei Blankenship
Abstract:
This paper considers the problem of simultaneously communicating two messages, a high-security message and a low-security message, to a legitimate receiver, referred to as the security embedding problem. An information-theoretic formulation of the problem is presented. A coding scheme that combines rate splitting, superposition coding, nested binning and channel prefixing is considered and is show…
▽ More
This paper considers the problem of simultaneously communicating two messages, a high-security message and a low-security message, to a legitimate receiver, referred to as the security embedding problem. An information-theoretic formulation of the problem is presented. A coding scheme that combines rate splitting, superposition coding, nested binning and channel prefixing is considered and is shown to achieve the secrecy capacity region of the channel in several scenarios. Specifying these results to both scalar and independent parallel Gaussian channels (under an average individual per-subchannel power constraint), it is shown that the high-security message can be embedded into the low-security message at full rate (as if the low-security message does not exist) without incurring any loss on the overall rate of communication (as if both messages are low-security messages). Extensions to the wiretap channel II setting of Ozarow and Wyner are also considered, where it is shown that "perfect" security embedding can be achieved by an encoder that uses a two-level coset code.
△ Less
Submitted 7 February, 2011;
originally announced February 2011.
-
Multiple-Input Multiple-Output Gaussian Broadcast Channels with Common and Confidential Messages
Authors:
Hung D. Ly,
Tie Liu,
Yingbin Liang
Abstract:
This paper considers the problem of the multiple-input multiple-output (MIMO) Gaussian broadcast channel with two receivers (receivers 1 and 2) and two messages: a common message intended for both receivers and a confidential message intended only for receiver 1 but needing to be kept asymptotically perfectly secure from receiver 2. A matrix characterization of the secrecy capacity region is est…
▽ More
This paper considers the problem of the multiple-input multiple-output (MIMO) Gaussian broadcast channel with two receivers (receivers 1 and 2) and two messages: a common message intended for both receivers and a confidential message intended only for receiver 1 but needing to be kept asymptotically perfectly secure from receiver 2. A matrix characterization of the secrecy capacity region is established via a channel enhancement argument. The enhanced channel is constructed by first splitting receiver 1 into two virtual receivers and then enhancing only the virtual receiver that decodes the confidential message. The secrecy capacity region of the enhanced channel is characterized using an extremal entropy inequality previously established for characterizing the capacity region of a degraded compound MIMO Gaussian broadcast channel.
△ Less
Submitted 15 July, 2009;
originally announced July 2009.