-
Human-free Prompted Based Anomaly Detection: prompt optimization with Meta-guiding prompt scheme
Authors:
Pi-Wei Chen,
Jerry Chun-Wei Lin,
Jia Ji,
Feng-Hao Yeh,
Chao-Chun Chen
Abstract:
Pre-trained vision-language models (VLMs) are highly adaptable to various downstream tasks through few-shot learning, making prompt-based anomaly detection a promising approach. Traditional methods depend on human-crafted prompts that require prior knowledge of specific anomaly types. Our goal is to develop a human-free prompt-based anomaly detection framework that optimally learns prompts through…
▽ More
Pre-trained vision-language models (VLMs) are highly adaptable to various downstream tasks through few-shot learning, making prompt-based anomaly detection a promising approach. Traditional methods depend on human-crafted prompts that require prior knowledge of specific anomaly types. Our goal is to develop a human-free prompt-based anomaly detection framework that optimally learns prompts through data-driven methods, eliminating the need for human intervention. The primary challenge in this approach is the lack of anomalous samples during the training phase. Additionally, the Vision Transformer (ViT)-based image encoder in VLMs is not ideal for pixel-wise anomaly segmentation due to a locality feature mismatch between the original image and the output feature map. To tackle the first challenge, we have developed the Object-Attention Anomaly Generation Module (OAGM) to synthesize anomaly samples for training. Furthermore, our Meta-Guiding Prompt-Tuning Scheme (MPTS) iteratively adjusts the gradient-based optimization direction of learnable prompts to avoid overfitting to the synthesized anomalies. For the second challenge, we propose Locality-Aware Attention, which ensures that each local patch feature attends only to nearby patch features, preserving the locality features corresponding to their original locations. This framework allows for the optimal prompt embeddings by searching in the continuous latent space via backpropagation, free from human semantic constraints. Additionally, the modified locality-aware attention improves the precision of pixel-wise anomaly segmentation.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Feature Purified Transformer With Cross-level Feature Guiding Decoder For Multi-class OOD and Anomaly Deteciton
Authors:
Jerry Chun-Wei Lin,
Pi-Wei Chen,
Chao-Chun Chen
Abstract:
Reconstruction networks are prevalently used in unsupervised anomaly and Out-of-Distribution (OOD) detection due to their independence from labeled anomaly data. However, in multi-class datasets, the effectiveness of anomaly detection is often compromised by the models' generalized reconstruction capabilities, which allow anomalies to blend within the expanded boundaries of normality resulting fro…
▽ More
Reconstruction networks are prevalently used in unsupervised anomaly and Out-of-Distribution (OOD) detection due to their independence from labeled anomaly data. However, in multi-class datasets, the effectiveness of anomaly detection is often compromised by the models' generalized reconstruction capabilities, which allow anomalies to blend within the expanded boundaries of normality resulting from the added categories, thereby reducing detection accuracy. We introduce the FUTUREG framework, which incorporates two innovative modules: the Feature Purification Module (FPM) and the CFG Decoder. The FPM constrains the normality boundary within the latent space to effectively filter out anomalous features, while the CFG Decoder uses layer-wise encoder representations to guide the reconstruction of filtered features, preserving fine-grained details. Together, these modules enhance the reconstruction error for anomalies, ensuring high-quality reconstructions for normal samples. Our results demonstrate that FUTUREG achieves state-of-the-art performance in multi-class OOD settings and remains competitive in industrial anomaly detection scenarios.
△ Less
Submitted 30 April, 2024;
originally announced June 2024.
-
Large Language Models in Education: Vision and Opportunities
Authors:
Wensheng Gan,
Zhenlian Qi,
Jiayang Wu,
Jerry Chun-Wei Lin
Abstract:
With the rapid development of artificial intelligence technology, large language models (LLMs) have become a hot research topic. Education plays an important role in human social development and progress. Traditional education faces challenges such as individual student differences, insufficient allocation of teaching resources, and assessment of teaching effectiveness. Therefore, the applications…
▽ More
With the rapid development of artificial intelligence technology, large language models (LLMs) have become a hot research topic. Education plays an important role in human social development and progress. Traditional education faces challenges such as individual student differences, insufficient allocation of teaching resources, and assessment of teaching effectiveness. Therefore, the applications of LLMs in the field of digital/smart education have broad prospects. The research on educational large models (EduLLMs) is constantly evolving, providing new methods and approaches to achieve personalized learning, intelligent tutoring, and educational assessment goals, thereby improving the quality of education and the learning experience. This article aims to investigate and summarize the application of LLMs in smart education. It first introduces the research background and motivation of LLMs and explains the essence of LLMs. It then discusses the relationship between digital education and EduLLMs and summarizes the current research status of educational large models. The main contributions are the systematic summary and vision of the research background, motivation, and application of large models for education (LLM4Edu). By reviewing existing research, this article provides guidance and insights for educators, researchers, and policy-makers to gain a deep understanding of the potential and challenges of LLM4Edu. It further provides guidance for further advancing the development and application of LLM4Edu, while still facing technical, ethical, and practical challenges requiring further research and exploration.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
HTPS: Heterogeneous Transferring Prediction System for Healthcare Datasets
Authors:
Jia-Hao Syu,
Jerry Chun-Wei Lin,
Marcin Fojcik,
Rafał Cupek
Abstract:
Medical internet of things leads to revolutionary improvements in medical services, also known as smart healthcare. With the big healthcare data, data mining and machine learning can assist wellness management and intelligent diagnosis, and achieve the P4-medicine. However, healthcare data has high sparsity and heterogeneity. In this paper, we propose a Heterogeneous Transferring Prediction System…
▽ More
Medical internet of things leads to revolutionary improvements in medical services, also known as smart healthcare. With the big healthcare data, data mining and machine learning can assist wellness management and intelligent diagnosis, and achieve the P4-medicine. However, healthcare data has high sparsity and heterogeneity. In this paper, we propose a Heterogeneous Transferring Prediction System (HTPS). Feature engineering mechanism transforms the dataset into sparse and dense feature matrices, and autoencoders in the embedding networks not only embed features but also transfer knowledge from heterogeneous datasets. Experimental results show that the proposed HTPS outperforms the benchmark systems on various prediction tasks and datasets, and ablation studies present the effectiveness of each designed mechanism. Experimental results demonstrate the negative impact of heterogeneous data on benchmark systems and the high transferability of the proposed HTPS.
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
A DNA Based Colour Image Encryption Scheme Using A Convolutional Autoencoder
Authors:
Fawad Ahmed,
Muneeb Ur Rehman,
Jawad Ahmad,
Muhammad Shahbaz Khan,
Wadii Boulila,
Gautam Srivastava,
Jerry Chun-Wei Lin,
William J. Buchanan
Abstract:
With the advancement in technology, digital images can easily be transmitted and stored over the Internet. Encryption is used to avoid illegal interception of digital images. Encrypting large-sized colour images in their original dimension generally results in low encryption/decryption speed along with exerting a burden on the limited bandwidth of the transmission channel. To address the aforement…
▽ More
With the advancement in technology, digital images can easily be transmitted and stored over the Internet. Encryption is used to avoid illegal interception of digital images. Encrypting large-sized colour images in their original dimension generally results in low encryption/decryption speed along with exerting a burden on the limited bandwidth of the transmission channel. To address the aforementioned issues, a new encryption scheme for colour images employing convolutional autoencoder, DNA and chaos is presented in this paper. The proposed scheme has two main modules, the dimensionality conversion module using the proposed convolutional autoencoder, and the encryption/decryption module using DNA and chaos. The dimension of the input colour image is first reduced from N $\times$ M $\times$ 3 to P $\times$ Q gray-scale image using the encoder. Encryption and decryption are then performed in the reduced dimension space. The decrypted gray-scale image is upsampled to obtain the original colour image having dimension N $\times$ M $\times$ 3. The training and validation accuracy of the proposed autoencoder is 97% and 95%, respectively. Once the autoencoder is trained, it can be used to reduce and subsequently increase the dimension of any arbitrary input colour image. The efficacy of the designed autoencoder has been demonstrated by the successful reconstruction of the compressed image into the original colour image with negligible perceptual distortion. The second major contribution presented in this paper is an image encryption scheme using DNA along with multiple chaotic sequences and substitution boxes. The security of the proposed image encryption algorithm has been gauged using several evaluation parameters, such as histogram of the cipher image, entropy, NPCR, UACI, key sensitivity, contrast, etc. encryption.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Itemset Utility Maximization with Correlation Measure
Authors:
Jiahui Chen,
Yixin Xu,
Shicheng Wan,
Wensheng Gan,
Jerry Chun-Wei Lin
Abstract:
As an important data mining technology, high utility itemset mining (HUIM) is used to find out interesting but hidden information (e.g., profit and risk). HUIM has been widely applied in many application scenarios, such as market analysis, medical detection, and web click stream analysis. However, most previous HUIM approaches often ignore the relationship between items in an itemset. Therefore, m…
▽ More
As an important data mining technology, high utility itemset mining (HUIM) is used to find out interesting but hidden information (e.g., profit and risk). HUIM has been widely applied in many application scenarios, such as market analysis, medical detection, and web click stream analysis. However, most previous HUIM approaches often ignore the relationship between items in an itemset. Therefore, many irrelevant combinations (e.g., \{gold, apple\} and \{notebook, book\}) are discovered in HUIM. To address this limitation, many algorithms have been proposed to mine correlated high utility itemsets (CoHUIs). In this paper, we propose a novel algorithm called the Itemset Utility Maximization with Correlation Measure (CoIUM), which considers both a strong correlation and the profitable values of the items. Besides, the novel algorithm adopts a database projection mechanism to reduce the cost of database scanning. Moreover, two upper bounds and four pruning strategies are utilized to effectively prune the search space. And a concise array-based structure named utility-bin is used to calculate and store the adopted upper bounds in linear time and space. Finally, extensive experimental results on dense and sparse datasets demonstrate that CoIUM significantly outperforms the state-of-the-art algorithms in terms of runtime and memory consumption.
△ Less
Submitted 26 August, 2022;
originally announced August 2022.
-
Generative Adversarial Networks for Image Super-Resolution: A Survey
Authors:
Chunwei Tian,
Xuanyu Zhang,
Jerry Chun-Wei Lin,
Wangmeng Zuo,
Yanning Zhang,
Chia-Wen Lin
Abstract:
Single image super-resolution (SISR) has played an important role in the field of image processing. Recent generative adversarial networks (GANs) can achieve excellent results on low-resolution images with small samples. However, there are little literatures summarizing different GANs in SISR. In this paper, we conduct a comparative study of GANs from different perspectives. We first take a look a…
▽ More
Single image super-resolution (SISR) has played an important role in the field of image processing. Recent generative adversarial networks (GANs) can achieve excellent results on low-resolution images with small samples. However, there are little literatures summarizing different GANs in SISR. In this paper, we conduct a comparative study of GANs from different perspectives. We first take a look at developments of GANs. Second, we present popular architectures for GANs in big and small samples for image applications. Then, we analyze motivations, implementations and differences of GANs based optimization methods and discriminative learning for image super-resolution in terms of supervised, semi-supervised and unsupervised manners. Next, we compare performance of these popular GANs on public datasets via quantitative and qualitative analysis in SISR. Finally, we highlight challenges of GANs and potential research points for SISR.
△ Less
Submitted 3 October, 2022; v1 submitted 28 April, 2022;
originally announced April 2022.
-
Secure Artificial Intelligence of Things for Implicit Group Recommendations
Authors:
Ke** Yu,
Zhiwei Guo,
Yu Shen,
Wei Wang,
Jerry Chun-Wei Lin,
Takuro Sato
Abstract:
The emergence of Artificial Intelligence of Things (AIoT) has provided novel insights for many social computing applications such as group recommender systems. As distance among people has been greatly shortened, it has been a more general demand to provide personalized services to groups instead of individuals. In order to capture group-level preference features from individuals, existing methods…
▽ More
The emergence of Artificial Intelligence of Things (AIoT) has provided novel insights for many social computing applications such as group recommender systems. As distance among people has been greatly shortened, it has been a more general demand to provide personalized services to groups instead of individuals. In order to capture group-level preference features from individuals, existing methods were mostly established via aggregation and face two aspects of challenges: secure data management workflow is absent, and implicit preference feedbacks is ignored. To tackle current difficulties, this paper proposes secure Artificial Intelligence of Things for implicit Group Recommendations (SAIoT-GR). As for hardware module, a secure IoT structure is developed as the bottom support platform. As for software module, collaborative Bayesian network model and non-cooperative game are can be introduced as algorithms. Such a secure AIoT architecture is able to maximize the advantages of the two modules. In addition, a large number of experiments are carried out to evaluate the performance of the SAIoT-GR in terms of efficiency and robustness.
△ Less
Submitted 23 April, 2021;
originally announced April 2021.
-
Utility Mining Across Multi-Sequences with Individualized Thresholds
Authors:
Wensheng Gan,
Jerry Chun-Wei Lin,
Jiexiong Zhang,
Philip S. Yu
Abstract:
Utility-oriented pattern mining has become an emerging topic since it can reveal high-utility patterns (e.g., itemsets, rules, sequences) from different types of data, which provides more information than the traditional frequent/confident-based pattern mining models. The utilities of various items are not exactly equal in realistic situations; each item has its own utility or importance. In gener…
▽ More
Utility-oriented pattern mining has become an emerging topic since it can reveal high-utility patterns (e.g., itemsets, rules, sequences) from different types of data, which provides more information than the traditional frequent/confident-based pattern mining models. The utilities of various items are not exactly equal in realistic situations; each item has its own utility or importance. In general, user considers a uniform minimum utility (minutil) threshold to identify the set of high-utility sequential patterns (HUSPs). This is unable to find the interesting patterns while the minutil is set extremely high or low. We first design a new utility mining framework namely USPT for mining high-Utility Sequential Patterns across multi-sequences with individualized Thresholds. Each item in the designed framework has its own specified minimum utility threshold. Based on the lexicographic-sequential tree and the utility-array structure, the USPT framework is presented to efficiently discover the HUSPs. With the upper-bounds on utility, several pruning strategies are developed to prune the unpromising candidates early in the search space. Several experiments are conducted on both real-life and synthetic datasets to show the performance of the designed USPT algorithm, and the results showed that USPT could achieve good effectiveness and efficiency for mining HUSPs with individualized minimum utility thresholds.
△ Less
Submitted 25 December, 2019;
originally announced December 2019.
-
Discovering High Utility Episodes in Sequences
Authors:
Wensheng Gan,
Jerry Chun-Wei Lin,
Han-Chieh Chao,
Philip S. Yu
Abstract:
Sequence data, e.g., complex event sequence, is more commonly seen than other types of data (e.g., transaction data) in real-world applications. For the mining task from sequence data, several problems have been formulated, such as sequential pattern mining, episode mining, and sequential rule mining. As one of the fundamental problems, episode mining has often been studied. The common wisdom is t…
▽ More
Sequence data, e.g., complex event sequence, is more commonly seen than other types of data (e.g., transaction data) in real-world applications. For the mining task from sequence data, several problems have been formulated, such as sequential pattern mining, episode mining, and sequential rule mining. As one of the fundamental problems, episode mining has often been studied. The common wisdom is that discovering frequent episodes is not useful enough. In this paper, we propose an efficient utility mining approach namely UMEpi: Utility Mining of high-utility Episodes from complex event sequence. We propose the concept of remaining utility of episode, and achieve a tighter upper bound, namely episode-weighted utilization (EWU), which will provide better pruning. Thus, the optimized EWU-based pruning strategies can achieve better improvements in mining efficiency. The search space of UMEpi w.r.t. a prefix-based lexicographic sequence tree is spanned and determined recursively for mining high-utility episodes, by prefix-spanning in a depth-first way. Finally, extensive experiments on four real-life datasets demonstrate that UMEpi can discover the complete high-utility episodes from complex event sequence, while the state-of-the-art algorithms fail to return the correct results. Furthermore, the improved variants of UMEpi significantly outperform the baseline in terms of execution time, memory consumption, and scalability.
△ Less
Submitted 25 December, 2019;
originally announced December 2019.
-
Utility-Driven Mining of Trend Information for Intelligent System
Authors:
Wensheng Gan,
Jerry Chun-Wei Lin,
Han-Chieh Chao,
Philippe Fournier-Viger,
Xuan Wang,
Philip S. Yu
Abstract:
Useful knowledge, embedded in a database, is likely to change over time. Identifying recent changes in temporal databases can provide valuable up-to-date information to decision-makers. Nevertheless, techniques for mining high-utility patterns (HUPs) seldom consider recency as a criterion to discover patterns. Thus, the traditional utility mining framework is inadequate for obtaining up-to-date in…
▽ More
Useful knowledge, embedded in a database, is likely to change over time. Identifying recent changes in temporal databases can provide valuable up-to-date information to decision-makers. Nevertheless, techniques for mining high-utility patterns (HUPs) seldom consider recency as a criterion to discover patterns. Thus, the traditional utility mining framework is inadequate for obtaining up-to-date insights about real world data. In this paper, we address this issue by introducing a novel framework, named utility-driven mining of Recent/trend high-Utility Patterns (RUP) in temporal databases for intelligent systems, based on user-specified minimum recency and minimum utility thresholds. The utility-driven RUP algorithm is based on novel global and conditional downward closure properties, and a recency-utility tree. Moreover, it adopts a vertical compact recency-utility list structure to store the information required by the mining process. The developed RUP algorithm recursively discovers recent HUPs. It is also fast and consumes a small amount of memory due to its pattern discovery approach that does not generate candidates. Two improved versions of the algorithm with additional pruning strategies are also designed to speed up the discovery of patterns by reducing the search space. Results of a substantial experimental evaluation show that the proposed algorithm can efficiently identify all recent high-utility patterns in large-scale databases, and that the improved algorithm performs best.
△ Less
Submitted 25 December, 2019;
originally announced December 2019.
-
Strongly chordal digraphs and $Γ$-free matrices
Authors:
Pavol Hell,
Cesar Hernandez-Cruz,
**g Huang,
Jephian C. -H. Lin
Abstract:
We define strongly chordal digraphs, which generalize strongly chordal graphs and chordal bipartite graphs, and are included in the class of chordal digraphs. They correspond to square 0,1 matrices that admit a simultaneous row and column permutation avoiding the Γ matrix. In general, it is not clear if these digraphs can be recognized in polynomial time, and we focus on symmetric digraphs (i.e.,…
▽ More
We define strongly chordal digraphs, which generalize strongly chordal graphs and chordal bipartite graphs, and are included in the class of chordal digraphs. They correspond to square 0,1 matrices that admit a simultaneous row and column permutation avoiding the Γ matrix. In general, it is not clear if these digraphs can be recognized in polynomial time, and we focus on symmetric digraphs (i.e., graphs with possible loops), tournaments with possible loops, and balanced digraphs. In each of these cases we give a polynomial-time recognition algorithm and a forbidden induced subgraph characterization. We also discuss an algorithm for minimum general dominating set in strongly chordal graphs with possible loops, extending and unifying similar algorithms for strongly chordal graphs and chordal bipartite graphs.
△ Less
Submitted 12 November, 2019; v1 submitted 8 September, 2019;
originally announced September 2019.
-
Fast Utility Mining on Complex Sequences
Authors:
Wensheng Gan,
Jerry Chun-Wei Lin,
Jiexiong Zhang,
Philippe Fournier-Viger,
Han-Chieh Chao,
Philip S. Yu
Abstract:
High-utility sequential pattern mining is an emerging topic in the field of Knowledge Discovery in Databases. It consists of discovering subsequences having a high utility (importance) in sequences, referred to as high-utility sequential patterns (HUSPs). HUSPs can be applied to many real-life applications, such as market basket analysis, E-commerce recommendation, click-stream analysis and scenic…
▽ More
High-utility sequential pattern mining is an emerging topic in the field of Knowledge Discovery in Databases. It consists of discovering subsequences having a high utility (importance) in sequences, referred to as high-utility sequential patterns (HUSPs). HUSPs can be applied to many real-life applications, such as market basket analysis, E-commerce recommendation, click-stream analysis and scenic route planning. For example, in economics and targeted marketing, understanding economic behavior of consumers is quite challenging, such as finding credible and reliable information on product profitability. Several algorithms have been proposed to address this problem by efficiently mining utility-based useful sequential patterns. Nevertheless, the performance of these algorithms can be unsatisfying in terms of runtime and memory usage due to the combinatorial explosion of the search space for low utility threshold and large databases. Hence, this paper proposes a more efficient algorithm for the task of high-utility sequential pattern mining, called HUSP-ULL. It utilizes a lexicographic sequence (LS)-tree and a utility-linked (UL)-list structure to fast discover HUSPs. Furthermore, two pruning strategies are introduced in HUSP-ULL to obtain tight upper-bounds on the utility of candidate sequences, and reduce the search space by pruning unpromising candidates early. Substantial experiments both on real-life and synthetic datasets show that the proposed algorithm can effectively and efficiently discover the complete set of HUSPs and outperforms the state-of-the-art algorithms.
△ Less
Submitted 27 April, 2019;
originally announced April 2019.
-
ProUM: Projection-based Utility Mining on Sequence Data
Authors:
Wensheng Gan,
Jerry Chun-Wei Lin,
Jiexiong Zhang,
Han-Chieh Chao,
Hamido Fujita,
Philip S. Yu
Abstract:
Utility is an important concept in economics. A variety of applications consider utility in real-life situations, which has lead to the emergence of utility-oriented mining (also called utility mining) in the recent decade. Utility mining has attracted a great amount of attention, but most of the existing studies have been developed to deal with itemset-based data. Time-ordered sequence data is mo…
▽ More
Utility is an important concept in economics. A variety of applications consider utility in real-life situations, which has lead to the emergence of utility-oriented mining (also called utility mining) in the recent decade. Utility mining has attracted a great amount of attention, but most of the existing studies have been developed to deal with itemset-based data. Time-ordered sequence data is more commonly seen in real-world situations, which is different from itemset-based data. Since they are time-consuming and require large amount of memory usage, current utility mining algorithms still have limitations when dealing with sequence data. In addition, the mining efficiency of utility mining on sequence data still needs to be improved, especially for long sequences or when there is a low minimum utility threshold. In this paper, we propose an efficient Projection-based Utility Mining (ProUM) approach to discover high-utility sequential patterns from sequence data. The utility-array structure is designed to store the necessary information of the sequence-order and utility. ProUM can significantly improve the mining efficiency by utilizing the projection technique in generating utility-array, and it effectively reduces the memory consumption. Furthermore, a new upper bound named sequence extension utility is proposed and several pruning strategies are further applied to improve the efficiency of ProUM. By taking utility theory into account, the derived high-utility sequential patterns have more insightful and interesting information than other kinds of patterns. Experimental results showed that the proposed ProUM algorithm significantly outperformed the state-of-the-art algorithms in terms of execution time, memory usage, and scalability.
△ Less
Submitted 12 September, 2019; v1 submitted 16 April, 2019;
originally announced April 2019.
-
Correlated Utility-based Pattern Mining
Authors:
Wensheng Gan,
Jerry Chun-Wei Lin,
Han-Chieh Chao,
Hamido Fujita,
Philip S. Yu
Abstract:
In the field of data mining and analytics, the utility theory from Economic can bring benefits in many real-life applications. In recent decade, a new research field called utility-oriented mining has already attracted great attention. Previous studies have, however, the limitation that they rarely consider the inherent correlation of items among patterns. Consider the purchase behaviors of consum…
▽ More
In the field of data mining and analytics, the utility theory from Economic can bring benefits in many real-life applications. In recent decade, a new research field called utility-oriented mining has already attracted great attention. Previous studies have, however, the limitation that they rarely consider the inherent correlation of items among patterns. Consider the purchase behaviors of consumer, a high-utility group of products (w.r.t. multi-products) may contain several very high-utility products with some low-utility products. However, it is considered as a valuable pattern even if this behavior/pattern may be not highly correlated, or even happen by chance. In this paper, in light of these challenges, we propose an efficient utility mining approach namely non-redundant Correlated high-Utility Pattern Miner (CoUPM) by taking positive correlation and profitable value into account. The derived patterns with high utility and strong positive correlation can lead to more insightful availability than those patterns only have high profitable values. The utility-list structure is revised and applied to store necessary information of both correlation and utility. Several pruning strategies are further developed to improve the efficiency for discovering the desired patterns. Experimental results show that the non-redundant correlated high-utility patterns have more effectiveness than some other kinds of interesting patterns. Moreover, efficiency of the proposed CoUPM algorithm significantly outperforms the state-of-the-art algorithm.
△ Less
Submitted 27 April, 2019; v1 submitted 5 April, 2019;
originally announced April 2019.
-
Utility-driven Data Analytics on Uncertain Data
Authors:
Wensheng Gan,
Jerry Chun-Wei Lin,
Han-Chieh Chao,
Athanasios V. Vasilakos,
Philip S. Yu
Abstract:
Modern Internet of Things (IoT) applications generate massive amounts of data, much of it in the form of objects/items of readings, events, and log entries. Specifically, most of the objects in these IoT data contain rich embedded information (e.g., frequency and uncertainty) and different level of importance (e.g., unit utility of items, interestingness, cost, risk, or weight). Many existing appr…
▽ More
Modern Internet of Things (IoT) applications generate massive amounts of data, much of it in the form of objects/items of readings, events, and log entries. Specifically, most of the objects in these IoT data contain rich embedded information (e.g., frequency and uncertainty) and different level of importance (e.g., unit utility of items, interestingness, cost, risk, or weight). Many existing approaches in data mining and analytics have limitations such as only the binary attribute is considered within a transaction, as well as all the objects/items having equal weights or importance. To solve these drawbacks, a novel utility-driven data analytics algorithm named HUPNU is presented, to extract High-Utility patterns by considering both Positive and Negative unit utilities from Uncertain data. The qualified high-utility patterns can be effectively discovered for risk prediction, manufacturing management, decision-making, among others. By using the developed vertical Probability-Utility list with the Positive-and-Negative utilities structure, as well as several effective pruning strategies. Experiments showed that the developed HUPNU approach performed great in mining the qualified patterns efficiently and effectively.
△ Less
Submitted 25 February, 2019;
originally announced February 2019.
-
Beyond Frequency: Utility Mining with Varied Item-Specific Minimum Utility
Authors:
Wensheng Gan,
Jerry Chun-Wei Lin,
Philippe Fournier-Viger,
Han-Chieh Chao,
Philip S Yu
Abstract:
Utility-oriented mining which integrates utility theory and data mining is a useful tool for understanding economic consumer behavior. Traditional algorithms for mining high-utility patterns (HUPs) applies a single/uniform minimum high-utility threshold (minutil) to obtain the set of HUPs, but in some real-life circumstances, some specific products may bring lower utilities compared with others, b…
▽ More
Utility-oriented mining which integrates utility theory and data mining is a useful tool for understanding economic consumer behavior. Traditional algorithms for mining high-utility patterns (HUPs) applies a single/uniform minimum high-utility threshold (minutil) to obtain the set of HUPs, but in some real-life circumstances, some specific products may bring lower utilities compared with others, but their profit may offer some vital information. However, if minutil is set high, the patterns with low minutil are missed; if minutil is set low, the number of patterns becomes unmanageable. In this paper, an efficient one-phase utility-oriented pattern mining algorithm, called HIMU, is proposed for mining HUPs with varied item-specific minimum utility. A novel tree structure called a multiple item utility set-enumeration tree (MIU-tree), the global sorted and the conditional downward closure properties are introduced in HIMU. In addition, we extended the compact utility-list structure to keep the necessary information, and thus this one-phase HIMU model greatly reduces the computational costs and memory requirements. Moreover, two pruning strategies are then extended to enhance the performance. We conducted extensive experiments in several synthetic and real-world datasets; the results indicates that the designed one-phase HIMU algorithm can address the "rare item problem" and has better performance than the state-of-the-art algorithms in terms of runtime, memory usage, and scalability. Furthermore, the enhanced algorithms outperform the non-optimized HIMU approach.
△ Less
Submitted 25 February, 2019;
originally announced February 2019.
-
Utility Mining Across Multi-Dimensional Sequences
Authors:
Wensheng Gan,
Jerry Chun-Wei Lin,
Jiexiong Zhang,
Hongzhi Yin,
Philippe Fournier-Viger,
Han-Chieh Chao,
Philip S. Yu
Abstract:
Knowledge extraction from database is the fundamental task in database and data mining community, which has been applied to a wide range of real-world applications and situations. Different from the support-based mining models, the utility-oriented mining framework integrates the utility theory to provide more informative and useful patterns. Time-dependent sequence data is commonly seen in real l…
▽ More
Knowledge extraction from database is the fundamental task in database and data mining community, which has been applied to a wide range of real-world applications and situations. Different from the support-based mining models, the utility-oriented mining framework integrates the utility theory to provide more informative and useful patterns. Time-dependent sequence data is commonly seen in real life. Sequence data has been widely utilized in many applications, such as analyzing sequential user behavior on the Web, influence maximization, route planning, and targeted marketing. Unfortunately, all the existing algorithms lose sight of the fact that the processed data not only contain rich features (e.g., occur quantity, risk, profit, etc.), but also may be associated with multi-dimensional auxiliary information, e.g., transaction sequence can be associated with purchaser profile information. In this paper, we first formulate the problem of utility mining across multi-dimensional sequences, and propose a novel framework named MDUS to extract Multi-Dimensional Utility-oriented Sequential useful patterns. Two algorithms respectively named MDUS_EM and MDUS_SD are presented to address the formulated problem. The former algorithm is based on database transformation, and the later one performs pattern joins and a searching method to identify desired patterns across multi-dimensional sequences. Extensive experiments are carried on five real-life datasets and one synthetic dataset to show that the proposed algorithms can effectively and efficiently discover the useful knowledge from multi-dimensional sequential databases. Moreover, the MDUS framework can provide better insight, and it is more adaptable to real-life situations than the current existing models.
△ Less
Submitted 25 February, 2019;
originally announced February 2019.
-
HUOPM: High Utility Occupancy Pattern Mining
Authors:
Wensheng Gan,
Jerry Chun-Wei Lin,
Philippe Fournier-Viger,
Han-Chieh Chao,
Philip S. Yu
Abstract:
Mining useful patterns from varied types of databases is an important research topic, which has many real-life applications. Most studies have considered the frequency as sole interestingness measure for identifying high quality patterns. However, each object is different in nature. The relative importance of objects is not equal, in terms of criteria such as the utility, risk, or interest. Beside…
▽ More
Mining useful patterns from varied types of databases is an important research topic, which has many real-life applications. Most studies have considered the frequency as sole interestingness measure for identifying high quality patterns. However, each object is different in nature. The relative importance of objects is not equal, in terms of criteria such as the utility, risk, or interest. Besides, another limitation of frequent patterns is that they generally have a low occupancy, i.e., they often represent small sets of items in transactions containing many items, and thus may not be truly representative of these transactions. To extract high quality patterns in real life applications, this paper extends the occupancy measure to also assess the utility of patterns in transaction databases. We propose an efficient algorithm named High Utility Occupancy Pattern Mining (HUOPM). It considers user preferences in terms of frequency, utility, and occupancy. A novel Frequency-Utility tree (FU-tree) and two compact data structures, called the utility-occupancy list and FU-table, are designed to provide global and partial downward closure properties for pruning the search space. The proposed method can efficiently discover the complete set of high quality patterns without candidate generation. Extensive experiments have been conducted on several datasets to evaluate the effectiveness and efficiency of the proposed algorithm. Results show that the derived patterns are intelligible, reasonable and acceptable, and that HUOPM with its pruning strategies outperforms the state-of-the-art algorithm, in terms of runtime and search space, respectively.
△ Less
Submitted 28 December, 2018;
originally announced December 2018.
-
Privacy Preserving Utility Mining: A Survey
Authors:
Wensheng Gan,
Jerry Chun-Wei Lin,
Han-Chieh Chao,
Shyue-Liang Wang,
Philip S. Yu
Abstract:
In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with…
▽ More
In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.
△ Less
Submitted 18 November, 2018;
originally announced November 2018.
-
A Survey of Parallel Sequential Pattern Mining
Authors:
Wensheng Gan,
Jerry Chun-Wei Lin,
Philippe Fournier-Viger,
Han-Chieh Chao,
Philip S. Yu
Abstract:
With the growing popularity of shared resources, large volumes of complex data of different types are collected automatically. Traditional data mining algorithms generally have problems and challenges including huge memory cost, low processing speed, and inadequate hard disk space. As a fundamental task of data mining, sequential pattern mining (SPM) is used in a wide variety of real-life applicat…
▽ More
With the growing popularity of shared resources, large volumes of complex data of different types are collected automatically. Traditional data mining algorithms generally have problems and challenges including huge memory cost, low processing speed, and inadequate hard disk space. As a fundamental task of data mining, sequential pattern mining (SPM) is used in a wide variety of real-life applications. However, it is more complex and challenging than other pattern mining tasks, i.e., frequent itemset mining and association rule mining, and also suffers from the above challenges when handling the large-scale data. To solve these problems, mining sequential patterns in a parallel or distributed computing environment has emerged as an important issue with many applications. In this paper, an in-depth survey of the current status of parallel sequential pattern mining (PSPM) is investigated and provided, including detailed categorization of traditional serial SPM approaches, and state of the art parallel SPM. We review the related work of parallel sequential pattern mining in detail, including partition-based algorithms for PSPM, Apriori-based PSPM, pattern growth based PSPM, and hybrid algorithms for PSPM, and provide deep description (i.e., characteristics, advantages, disadvantages and summarization) of these parallel approaches of PSPM. Some advanced topics for PSPM, including parallel quantitative / weighted / utility sequential pattern mining, PSPM from uncertain data and stream data, hardware acceleration for PSPM, are further reviewed in details. Besides, we review and provide some well-known open-source software of PSPM. Finally, we summarize some challenges and opportunities of PSPM in the big data era.
△ Less
Submitted 4 April, 2019; v1 submitted 26 May, 2018;
originally announced May 2018.
-
A Survey of Utility-Oriented Pattern Mining
Authors:
Wensheng Gan,
Jerry Chun-Wei Lin,
Philippe Fournier-Viger,
Han-Chieh Chao,
Vincent S. Tseng,
Philip S. Yu
Abstract:
The main purpose of data mining and analytics is to find novel, potentially useful patterns that can be utilized in real-world applications to derive beneficial knowledge. For identifying and evaluating the usefulness of different kinds of patterns, many techniques and constraints have been proposed, such as support, confidence, sequence order, and utility parameters (e.g., weight, price, profit,…
▽ More
The main purpose of data mining and analytics is to find novel, potentially useful patterns that can be utilized in real-world applications to derive beneficial knowledge. For identifying and evaluating the usefulness of different kinds of patterns, many techniques and constraints have been proposed, such as support, confidence, sequence order, and utility parameters (e.g., weight, price, profit, quantity, satisfaction, etc.). In recent years, there has been an increasing demand for utility-oriented pattern mining (UPM, or called utility mining). UPM is a vital task, with numerous high-impact applications, including cross-marketing, e-commerce, finance, medical, and biomedical applications. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods of UPM. First, we introduce an in-depth understanding of UPM, including concepts, examples, and comparisons with related concepts. A taxonomy of the most common and state-of-the-art approaches for mining different kinds of high-utility patterns is presented in detail, including Apriori-based, tree-based, projection-based, vertical-/horizontal-data-format-based, and other hybrid approaches. A comprehensive review of advanced topics of existing high-utility pattern mining techniques is offered, with a discussion of their pros and cons. Finally, we present several well-known open-source software packages for UPM. We conclude our survey with a discussion on open and practical challenges in this field.
△ Less
Submitted 16 September, 2019; v1 submitted 26 May, 2018;
originally announced May 2018.
-
On the error of a priori sampling: zero forcing sets and propagation time
Authors:
Franklin H. J. Kenter,
Jephian C. -H. Lin
Abstract:
Zero forcing is an iterative process on a graph used to bound the maximum nullity. The process begins with select vertices as colored, and the remaining vertices can become colored under a specific color change rule. The goal is to find a minimum set of vertices such that after iteratively applying the rule, all of the vertices become colored (i.e., a minimum zero forcing set). Of particular inter…
▽ More
Zero forcing is an iterative process on a graph used to bound the maximum nullity. The process begins with select vertices as colored, and the remaining vertices can become colored under a specific color change rule. The goal is to find a minimum set of vertices such that after iteratively applying the rule, all of the vertices become colored (i.e., a minimum zero forcing set). Of particular interest is the propagation time of a chosen set which is the number of steps the rule must be applied in order to color all the vertices of a graph.
We give a purely linear algebraic interpretation of zero forcing: Find a set of vertices $S$ such that for any weighted adjacency matrix $\mathbf{A}$, whenever $\mathbf{Ax} = \mathbf{0}$, the entirety of of $\mathbf{x}$ can be recovered using only $\mathbf{x}_S$, the entries corresponding to $S$. The key here is that $S$ must be chosen before $\mathbf{A}$. In this light, we are able to give a linear algebraic interpretation of the propagation time: Any error in $\mathbf{x}_S$ effects the error of $\mathbf{x}$ exponentially in the propagation time. This error can be quantitatively measured using newly defined zero forcing-related parameters, the error polynomial vector and the variance polynomial vector. In this sense, the quality of two zero forcing sets can objectively be compared even if the sets are the same size and their propagation time is the same. Examples and constructions are given.
△ Less
Submitted 25 September, 2017;
originally announced September 2017.
-
Analogies between the crossing number and the tangle crossing number
Authors:
Robin Anderson,
Shuliang Bai,
Fidel Barrera-Cruz,
Éva Czabarka,
Giordano Da Lozzo,
Natalie L. F. Hobson,
Jephian C. -H. Lin,
Austin Mohr,
Heather C. Smith,
László A. Székely,
Hays Whitlatch
Abstract:
Tanglegrams are special graphs that consist of a pair of rooted binary trees with the same number of leaves, and a perfect matching between the two leaf-sets. These objects are of use in phylogenetics and are represented with straightline drawings where the leaves of the two plane binary trees are on two parallel lines and only the matching edges can cross. The tangle crossing number of a tanglegr…
▽ More
Tanglegrams are special graphs that consist of a pair of rooted binary trees with the same number of leaves, and a perfect matching between the two leaf-sets. These objects are of use in phylogenetics and are represented with straightline drawings where the leaves of the two plane binary trees are on two parallel lines and only the matching edges can cross. The tangle crossing number of a tanglegram is the minimum crossing number over all such drawings and is related to biologically relevant quantities, such as the number of times a parasite switched hosts.
Our main results for tanglegrams which parallel known theorems for crossing numbers are as follows. The removal of a single matching edge in a tanglegram with $n$ leaves decreases the tangle crossing number by at most $n-3$, and this is sharp. Additionally, if $γ(n)$ is the maximum tangle crossing number of a tanglegram with $n$ leaves, we prove $\frac{1}{2}\binom{n}{2}(1-o(1))\leγ(n)<\frac{1}{2}\binom{n}{2}$. Further, we provide an algorithm for computing non-trivial lower bounds on the tangle crossing number in $O(n^4)$ time. This lower bound may be tight, even for tanglegrams with tangle crossing number $Θ(n^2)$.
△ Less
Submitted 23 September, 2017;
originally announced September 2017.
-
Theory of One Tape Linear Time Turing Machines
Authors:
Kohtaro Tadaki,
Tomoyuki Yamakami,
Jack C. H. Lin
Abstract:
A theory of one-tape (one-head) linear-time Turing machines is essentially different from its polynomial-time counterpart since these machines are closely related to finite state automata. This paper discusses structural-complexity issues of one-tape Turing machines of various types (deterministic, nondeterministic, reversible, alternating, probabilistic, counting, and quantum Turing machines) t…
▽ More
A theory of one-tape (one-head) linear-time Turing machines is essentially different from its polynomial-time counterpart since these machines are closely related to finite state automata. This paper discusses structural-complexity issues of one-tape Turing machines of various types (deterministic, nondeterministic, reversible, alternating, probabilistic, counting, and quantum Turing machines) that halt in linear time, where the running time of a machine is defined as the length of any longest computation path. We explore structural properties of one-tape linear-time Turing machines and clarify how the machines' resources affect their computational patterns and power.
△ Less
Submitted 17 July, 2009; v1 submitted 23 October, 2003;
originally announced October 2003.