-
Scalable High-Dimensional Multivariate Linear Regression for Feature-Distributed Data
Authors:
Shuo-Chieh Huang,
Ruey S. Tsay
Abstract:
Feature-distributed data, referred to data partitioned by features and stored across multiple computing nodes, are increasingly common in applications with a large number of features. This paper proposes a two-stage relaxed greedy algorithm (TSRGA) for applying multivariate linear regression to such data. The main advantage of TSRGA is that its communication complexity does not depend on the featu…
▽ More
Feature-distributed data, referred to data partitioned by features and stored across multiple computing nodes, are increasingly common in applications with a large number of features. This paper proposes a two-stage relaxed greedy algorithm (TSRGA) for applying multivariate linear regression to such data. The main advantage of TSRGA is that its communication complexity does not depend on the feature dimension, making it highly scalable to very large data sets. In addition, for multivariate response variables, TSRGA can be used to yield low-rank coefficient estimates. The fast convergence of TSRGA is validated by simulation experiments. Finally, we apply the proposed TSRGA in a financial application that leverages unstructured data from the 10-K reports, demonstrating its usefulness in applications with many dense large-dimensional matrices.
△ Less
Submitted 10 March, 2024; v1 submitted 7 July, 2023;
originally announced July 2023.
-
An Owner-managed Indirect-Permission Social Authentication Method for Private Key Recovery
Authors:
Wei-Hsin Chang,
Ren-Song Tsay
Abstract:
In this paper, we propose a very secure and reliable owner-self-managed private key recovery method. In recent years, Public Key Authentication (PKA) method has been identified as the most feasible online security solution. However, losing the private key also implies the risk of losing the ownership of the assets associated with the private key. For key protection, the commonly adopted something-…
▽ More
In this paper, we propose a very secure and reliable owner-self-managed private key recovery method. In recent years, Public Key Authentication (PKA) method has been identified as the most feasible online security solution. However, losing the private key also implies the risk of losing the ownership of the assets associated with the private key. For key protection, the commonly adopted something-you-x solutions require a new secret to protect the target secret and fall into a circular protection issue as the new secret has to be protected too. To resolve the circular protection issue and provide a truly secure and reliable solution, we propose separating the permission and possession of the private key. Then we create secret shares of the permission using the open public keys of selected trustees while having the owner possess the permission-encrypted private key. Then by applying the social authentication method, one may easily retrieve the permission to recover the private key. Our analysis shows that our proposed indirect-permission method is six orders of magnitude more secure and reliable than
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
The Influence of Data Pre-processing and Post-processing on Long Document Summarization
Authors:
Xinwei Du,
Kailun Dong,
Yuchen Zhang,
Yongsheng Li,
Ruei-Yu Tsay
Abstract:
Long document summarization is an important and hard task in the field of natural language processing. A good performance of the long document summarization reveals the model has a decent understanding of the human language. Currently, most researches focus on how to modify the attention mechanism of the transformer to achieve a higher ROUGE score. The study of data pre-processing and post-process…
▽ More
Long document summarization is an important and hard task in the field of natural language processing. A good performance of the long document summarization reveals the model has a decent understanding of the human language. Currently, most researches focus on how to modify the attention mechanism of the transformer to achieve a higher ROUGE score. The study of data pre-processing and post-processing are relatively few. In this paper, we use two pre-processing methods and a post-processing method and analyze the effect of these methods on various long document summarization models.
△ Less
Submitted 2 December, 2021;
originally announced December 2021.
-
3LSAA: A Secure And Privacy-preserving Zero-knowledge-based Data-sharing Approach Under An Untrusted Environment
Authors:
Wei-Yi Kuo,
Ren-Song Tsay
Abstract:
As data collection and analysis become critical functions for many cloud applications, proper data sharing with approved parties is required. However, the traditional data sharing scheme through centralized data escrow servers may sacrifice owners' privacy and is weak in security. Mainly, the servers physically own all data while the original data owners have only virtual ownership and lose actual…
▽ More
As data collection and analysis become critical functions for many cloud applications, proper data sharing with approved parties is required. However, the traditional data sharing scheme through centralized data escrow servers may sacrifice owners' privacy and is weak in security. Mainly, the servers physically own all data while the original data owners have only virtual ownership and lose actual access control. Therefore, we propose a 3-layer SSE-ABE-AES (3LSAA) cryptography-based privacy-protected data-sharing protocol based on the assumption that servers are honest-but-curious. The 3LSAA protocol realizes automatic access control management and convenient file search even if the server is not trustable. Besides achieving data self-sovereignty, our approach also improves system usability, eliminates the defects in the traditional SSE and ABE approaches, and provides a local AES key recovery method for user's availability.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
A Precise Program Phase Identification Method Based on Frequency Domain Analysis
Authors:
Hsuan-Yi Lin,
Ren-Song Tsay
Abstract:
In this paper, we present a systematic approach that transforms the program execution trace into the frequency domain and precisely identifies program phases. The analyzed results can be embedded into program code to mark the starting point and execution characteristics, such as CPI (Cycles per Instruction), of each phase. The so generated information can be applied to runtime program phase predic…
▽ More
In this paper, we present a systematic approach that transforms the program execution trace into the frequency domain and precisely identifies program phases. The analyzed results can be embedded into program code to mark the starting point and execution characteristics, such as CPI (Cycles per Instruction), of each phase. The so generated information can be applied to runtime program phase prediction. With the precise program phase information, more intelligent software and system optimization techniques can be further explored and developed.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
An Effective Early Multi-core System Shared Cache Design Method Based on Reuse-distance Analysis
Authors:
Hsin-Yu Ho,
Ren-Song Tsay
Abstract:
In this paper, we proposed an effective and efficient multi-core shared-cache design optimization approach based on reuse-distance analysis of the data traces of target applications. Since data traces are independent of system hardware architectures, a designer can easily compute the best cache design at the early system design phase using our approach. We devise a very efficient and yet accurate…
▽ More
In this paper, we proposed an effective and efficient multi-core shared-cache design optimization approach based on reuse-distance analysis of the data traces of target applications. Since data traces are independent of system hardware architectures, a designer can easily compute the best cache design at the early system design phase using our approach. We devise a very efficient and yet accurate method to derive the aggregated reuse-distance histograms of concurrent applications for accurate cache performance analysis and optimization. Essentially, the actual shared-cache contention results of concurrent applications are embedded in the aggregated reuse-distance histograms and therefore the approach is very effective. The experimental results show that the average error rate of shared-cache miss-count estimations of our approach is less than 2.4%. Using a simple scanning search method, one can easily determine the true optimal cache configurations at the early system design phase.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
A Fast-and-Effective Early-Stage Multi-level Cache Optimization Method Based on Reuse-Distance Analysis
Authors:
Cheng-Lin Tsai,
Ren-Song Tsay
Abstract:
In this paper, we propose a practical and effective approach allowing designers to optimize multi-level cache size at the early system design phase. Our key contribution is to generalize the reuse distance analysis method and develop an effective and practical cache design optimization approach. We adopt a simple scanning search method to locate optimal cache solutions in terms of cache size, powe…
▽ More
In this paper, we propose a practical and effective approach allowing designers to optimize multi-level cache size at the early system design phase. Our key contribution is to generalize the reuse distance analysis method and develop an effective and practical cache design optimization approach. We adopt a simple scanning search method to locate optimal cache solutions in terms of cache size, power consumption, or average data access delay. The proposed approach is particularly useful for early-phase system designers and is verified to be 150 to 250 times faster than the traditional simulation-based approach. In addition, we also introduce a simplified analytical model and provide designers insights about how cache design parameters may affect the expected results. As a result, designers can make an adequate decision in the early system design phase.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
Analytical Process Scheduling Optimization for Heterogeneous Multi-core Systems
Authors:
Chien-Hao Chen,
Ren-Song Tsay
Abstract:
In this paper, we propose the first optimum process scheduling algorithm for an increasingly prevalent type of heterogeneous multicore (HEMC) system that combines high-performance big cores and energy-efficient small cores with the same instruction-set architecture (ISA). Existing algorithms are all heuristics-based, and the well-known IPC-driven approach essentially tries to schedule high scaling…
▽ More
In this paper, we propose the first optimum process scheduling algorithm for an increasingly prevalent type of heterogeneous multicore (HEMC) system that combines high-performance big cores and energy-efficient small cores with the same instruction-set architecture (ISA). Existing algorithms are all heuristics-based, and the well-known IPC-driven approach essentially tries to schedule high scaling factor processes on big cores. Our analysis shows that, for optimum solutions, it is also critical to consider placing long running processes on big cores. Tests of SPEC 2006 cases on various big-small core combinations show that our proposed optimum approach is up to 34% faster than the IPC-driven heuristic approach in terms of total workload completion time. The complexity of our algorithm is O(NlogN) where N is the number of processes. Therefore, the proposed optimum algorithm is practical for use.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
Automatic Timing-Coherent Transactor Generation for Mixed-level Simulations
Authors:
Li-Chun Chen,
Hsin-I Wu,
Ren-Song Tsay
Abstract:
In this paper we extend the concept of the traditional transactor, which focuses on correct content transfer, to a new timing-coherent transactor that also accurately aligns the timing of each transaction boundary so that designers can perform precise concurrent system behavior analysis in mixed-abstraction-level system simulations which are essential to increasingly complex system designs. To str…
▽ More
In this paper we extend the concept of the traditional transactor, which focuses on correct content transfer, to a new timing-coherent transactor that also accurately aligns the timing of each transaction boundary so that designers can perform precise concurrent system behavior analysis in mixed-abstraction-level system simulations which are essential to increasingly complex system designs. To streamline the process, we also developed an automatic approach for timing-coherent transactor generation. Our approach is actually applied in mixed-level simulations and the results show that it achieves 100% timing accuracy while the conventional approach produces results of 25% to 44% error rate.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
An Effective Parallel Program Debugging Approach Based on Timing Annotation
Authors:
Yun Chang,
Hsin-I Wu,
Ren-Song Tsay
Abstract:
We propose an effective parallel program debugging approach based on the timing annotation technique. With prevalent multi-core platforms, parallel programming is required to fully utilize the computing power. However, the non-determinism property and the associated concurrency bugs are notorious and remain to be great challenge to designers. We hence propose an effective program debugging approac…
▽ More
We propose an effective parallel program debugging approach based on the timing annotation technique. With prevalent multi-core platforms, parallel programming is required to fully utilize the computing power. However, the non-determinism property and the associated concurrency bugs are notorious and remain to be great challenge to designers. We hence propose an effective program debugging approach using the timing annotation technique derived from the deterministic Multi-Core Instruction Set Simulation (MCISS) technology. We hence construct a deterministic execution environment for parallel program debugging and devise a few unique, effective and easy-to-use parallel debugging functions. We modify QEMU and GDB to implement and demonstrate our proposed idea. The usage of our debugger is almost identical to the conventional GDB debugger. Therefore, users may learn how to use the tool seamlessly.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
A High-Performance Adaptive Quantization Approach for Edge CNN Applications
Authors:
Hsu-Hsun Chin,
Ren-Song Tsay,
Hsin-I Wu
Abstract:
Recent convolutional neural network (CNN) development continues to advance the state-of-the-art model accuracy for various applications. However, the enhanced accuracy comes at the cost of substantial memory bandwidth and storage requirements and demanding computational resources. Although in the past the quantization methods have effectively reduced the deployment cost for edge devices, it suffer…
▽ More
Recent convolutional neural network (CNN) development continues to advance the state-of-the-art model accuracy for various applications. However, the enhanced accuracy comes at the cost of substantial memory bandwidth and storage requirements and demanding computational resources. Although in the past the quantization methods have effectively reduced the deployment cost for edge devices, it suffers from significant information loss when processing the biased activations of contemporary CNNs. In this paper, we hence introduce an adaptive high-performance quantization method to resolve the issue of biased activation by dynamically adjusting the scaling and shifting factors based on the task loss. Our proposed method has been extensively evaluated on image classification models (ResNet-18/34/50, MobileNet-V2, EfficientNet-B0) with ImageNet dataset, object detection model (YOLO-V4) with COCO dataset, and language models with PTB dataset. The results show that our 4-bit integer (INT4) quantization models achieve better accuracy than the state-of-the-art 4-bit models, and in some cases, even surpass the golden full-precision models. The final designs have been successfully deployed onto extremely resource-constrained edge devices for many practical applications.
△ Less
Submitted 18 July, 2021;
originally announced July 2021.
-
A Very Compact Embedded CNN Processor Design Based on Logarithmic Computing
Authors:
Tsung-Ying Lu,
Hsu-Hsun Chin,
Hsin-I Wu,
Ren-Song Tsay
Abstract:
In this paper, we propose a very compact embedded CNN processor design based on a modified logarithmic computing method using very low bit-width representation. Our high-quality CNN processor can easily fit into edge devices. For Yolov2, our processing circuit takes only 0.15 mm2 using TSMC 40 nm cell library. The key idea is to constrain the activation and weight values of all layers uniformly to…
▽ More
In this paper, we propose a very compact embedded CNN processor design based on a modified logarithmic computing method using very low bit-width representation. Our high-quality CNN processor can easily fit into edge devices. For Yolov2, our processing circuit takes only 0.15 mm2 using TSMC 40 nm cell library. The key idea is to constrain the activation and weight values of all layers uniformly to be within the range [-1, 1] and produce low bit-width logarithmic representation. With the uniform representations, we devise a unified, reusable CNN computing kernel and significantly reduce computing resources. The proposed approach has been extensively evaluated on many popular image classification CNN models (AlexNet, VGG16, and ResNet-18/34) and object detection models (Yolov2). The hardware-implemented results show that our design consumes only minimal computing and storage resources, yet attains very high accuracy. The design is thoroughly verified on FPGAs, and the SoC integration is underway with promising results. With extremely efficient resource and energy usage, our design is excellent for edge computing purposes.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Tensor Canonical Correlation Analysis with Convergence and Statistical Guarantees
Authors:
You-Lin Chen,
Mladen Kolar,
Ruey S. Tsay
Abstract:
In many applications, such as classification of images or videos, it is of interest to develop a framework for tensor data instead of an ad-hoc way of transforming data to vectors due to the computational and under-sampling issues. In this paper, we study convergence and statistical properties of two-dimensional canonical correlation analysis \citep{Lee2007Two} under an assumption that data come f…
▽ More
In many applications, such as classification of images or videos, it is of interest to develop a framework for tensor data instead of an ad-hoc way of transforming data to vectors due to the computational and under-sampling issues. In this paper, we study convergence and statistical properties of two-dimensional canonical correlation analysis \citep{Lee2007Two} under an assumption that data come from a probabilistic model. We show that carefully initialized the power method converges to the optimum and provide a finite sample bound. Then we extend this framework to tensor-valued data and propose the higher-order power method, which is commonly used in tensor decomposition, to extract the canonical directions. Our method can be used effectively in a large-scale data setting by solving the inner least squares problem with a stochastic gradient descent, and we justify convergence via the theory of Lojasiewicz's inequalities without any assumption on data generating process and initialization. For practical applications, we further develop (a) an inexact updating scheme which allows us to use the state-of-the-art stochastic gradient descent algorithm, (b) an effective initialization scheme which alleviates the problem of local optimum in non-convex optimization, and (c) a deflation procedure for extracting several canonical components. Empirical analyses on challenging data including gene expression and air pollution indexes in Taiwan, show the effectiveness and efficiency of the proposed methodology. Our results fill a missing, but crucial, part in the literature on tensor data.
△ Less
Submitted 11 November, 2020; v1 submitted 12 June, 2019;
originally announced June 2019.