Search | arXiv e-print repository

Scalable High-Dimensional Multivariate Linear Regression for Feature-Distributed Data

Abstract: Feature-distributed data, referred to data partitioned by features and stored across multiple computing nodes, are increasingly common in applications with a large number of features. This paper proposes a two-stage relaxed greedy algorithm (TSRGA) for applying multivariate linear regression to such data. The main advantage of TSRGA is that its communication complexity does not depend on the featu… ▽ More Feature-distributed data, referred to data partitioned by features and stored across multiple computing nodes, are increasingly common in applications with a large number of features. This paper proposes a two-stage relaxed greedy algorithm (TSRGA) for applying multivariate linear regression to such data. The main advantage of TSRGA is that its communication complexity does not depend on the feature dimension, making it highly scalable to very large data sets. In addition, for multivariate response variables, TSRGA can be used to yield low-rank coefficient estimates. The fast convergence of TSRGA is validated by simulation experiments. Finally, we apply the proposed TSRGA in a financial application that leverages unstructured data from the 10-K reports, demonstrating its usefulness in applications with many dense large-dimensional matrices. △ Less

Submitted 10 March, 2024; v1 submitted 7 July, 2023; originally announced July 2023.

arXiv:2209.09388 [pdf]

An Owner-managed Indirect-Permission Social Authentication Method for Private Key Recovery

Authors: Wei-Hsin Chang, Ren-Song Tsay

Abstract: In this paper, we propose a very secure and reliable owner-self-managed private key recovery method. In recent years, Public Key Authentication (PKA) method has been identified as the most feasible online security solution. However, losing the private key also implies the risk of losing the ownership of the assets associated with the private key. For key protection, the commonly adopted something-… ▽ More In this paper, we propose a very secure and reliable owner-self-managed private key recovery method. In recent years, Public Key Authentication (PKA) method has been identified as the most feasible online security solution. However, losing the private key also implies the risk of losing the ownership of the assets associated with the private key. For key protection, the commonly adopted something-you-x solutions require a new secret to protect the target secret and fall into a circular protection issue as the new secret has to be protected too. To resolve the circular protection issue and provide a truly secure and reliable solution, we propose separating the permission and possession of the private key. Then we create secret shares of the permission using the open public keys of selected trustees while having the owner possess the permission-encrypted private key. Then by applying the social authentication method, one may easily retrieve the permission to recover the private key. Our analysis shows that our proposed indirect-permission method is six orders of magnitude more secure and reliable than △ Less

Submitted 19 September, 2022; originally announced September 2022.

arXiv:2112.01660 [pdf, other]

The Influence of Data Pre-processing and Post-processing on Long Document Summarization

Authors: Xinwei Du, Kailun Dong, Yuchen Zhang, Yongsheng Li, Ruei-Yu Tsay

Abstract: Long document summarization is an important and hard task in the field of natural language processing. A good performance of the long document summarization reveals the model has a decent understanding of the human language. Currently, most researches focus on how to modify the attention mechanism of the transformer to achieve a higher ROUGE score. The study of data pre-processing and post-process… ▽ More Long document summarization is an important and hard task in the field of natural language processing. A good performance of the long document summarization reveals the model has a decent understanding of the human language. Currently, most researches focus on how to modify the attention mechanism of the transformer to achieve a higher ROUGE score. The study of data pre-processing and post-processing are relatively few. In this paper, we use two pre-processing methods and a post-processing method and analyze the effect of these methods on various long document summarization models. △ Less

Submitted 2 December, 2021; originally announced December 2021.

arXiv:2110.06413 [pdf]

3LSAA: A Secure And Privacy-preserving Zero-knowledge-based Data-sharing Approach Under An Untrusted Environment

Authors: Wei-Yi Kuo, Ren-Song Tsay

Abstract: As data collection and analysis become critical functions for many cloud applications, proper data sharing with approved parties is required. However, the traditional data sharing scheme through centralized data escrow servers may sacrifice owners' privacy and is weak in security. Mainly, the servers physically own all data while the original data owners have only virtual ownership and lose actual… ▽ More As data collection and analysis become critical functions for many cloud applications, proper data sharing with approved parties is required. However, the traditional data sharing scheme through centralized data escrow servers may sacrifice owners' privacy and is weak in security. Mainly, the servers physically own all data while the original data owners have only virtual ownership and lose actual access control. Therefore, we propose a 3-layer SSE-ABE-AES (3LSAA) cryptography-based privacy-protected data-sharing protocol based on the assumption that servers are honest-but-curious. The 3LSAA protocol realizes automatic access control management and convenient file search even if the server is not trustable. Besides achieving data self-sovereignty, our approach also improves system usability, eliminates the defects in the traditional SSE and ABE approaches, and provides a local AES key recovery method for user's availability. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: 21 pages, 7 figures

arXiv:2109.05896 [pdf]

A Precise Program Phase Identification Method Based on Frequency Domain Analysis

Authors: Hsuan-Yi Lin, Ren-Song Tsay

Abstract: In this paper, we present a systematic approach that transforms the program execution trace into the frequency domain and precisely identifies program phases. The analyzed results can be embedded into program code to mark the starting point and execution characteristics, such as CPI (Cycles per Instruction), of each phase. The so generated information can be applied to runtime program phase predic… ▽ More In this paper, we present a systematic approach that transforms the program execution trace into the frequency domain and precisely identifies program phases. The analyzed results can be embedded into program code to mark the starting point and execution characteristics, such as CPI (Cycles per Instruction), of each phase. The so generated information can be applied to runtime program phase prediction. With the precise program phase information, more intelligent software and system optimization techniques can be further explored and developed. △ Less