Skip to main content

Showing 1–37 of 37 results for author: Yeh, C M

.
  1. Masked Graph Transformer for Large-Scale Recommendation

    Authors: Huiyuan Chen, Zhe Xu, Chin-Chia Michael Yeh, Vivian Lai, Yan Zheng, Minghua Xu, Hanghang Tong

    Abstract: Graph Transformers have garnered significant attention for learning graph-structured data, thanks to their superb ability to capture long-range dependencies among nodes. However, the quadratic space and time complexity hinders the scalability of Graph Transformers, particularly for large-scale recommendation. Here we propose an efficient Masked Graph Transformer, named MGFormer, capable of capturi… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  2. arXiv:2402.10487  [pdf, other

    cs.LG cs.AI

    RPMixer: Shaking Up Time Series Forecasting with Random Projections for Large Spatial-Temporal Data

    Authors: Chin-Chia Michael Yeh, Yujie Fan, Xin Dai, Uday Singh Saini, Vivian Lai, Prince Osei Aboagye, Junpeng Wang, Huiyuan Chen, Yan Zheng, Zhongfang Zhuang, Liang Wang, Wei Zhang

    Abstract: Spatial-temporal forecasting systems play a crucial role in addressing numerous real-world challenges. In this paper, we investigate the potential of addressing spatial-temporal forecasting problems using general time series forecasting models, i.e., models that do not leverage the spatial relationships among the nodes. We propose a all-Multi-Layer Perceptron (all-MLP) time series forecasting arch… ▽ More

    Submitted 12 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  3. arXiv:2401.09489  [pdf

    cs.LG cs.AI

    PUPAE: Intuitive and Actionable Explanations for Time Series Anomalies

    Authors: Audrey Der, Chin-Chia Michael Yeh, Yan Zheng, Junpeng Wang, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn J. Keogh

    Abstract: In recent years there has been significant progress in time series anomaly detection. However, after detecting an (perhaps tentative) anomaly, can we explain it? Such explanations would be useful to triage anomalies. For example, in an oil refinery, should we respond to an anomaly by dispatching a hydraulic engineer, or an intern to replace the battery on a sensor? There have been some parallel ef… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 9 Page Manuscript, 1 Page Supplementary (Supplement not published in conference proceedings.)

    Journal ref: SIAM SDM 2024

  4. arXiv:2311.03393  [pdf, other

    cs.DB cs.AI

    Sketching Multidimensional Time Series for Fast Discord Mining

    Authors: Chin-Chia Michael Yeh, Yan Zheng, Menghai Pan, Huiyuan Chen, Zhongfang Zhuang, Junpeng Wang, Liang Wang, Wei Zhang, Jeff M. Phillips, Eamonn Keogh

    Abstract: Time series discords are a useful primitive for time series anomaly detection, and the matrix profile is capable of capturing discord effectively. There exist many research efforts to improve the scalability of discord discovery with respect to the length of time series. However, there is surprisingly little work focused on reducing the time complexity of matrix profile computation associated with… ▽ More

    Submitted 7 December, 2023; v1 submitted 5 November, 2023; originally announced November 2023.

  5. arXiv:2311.02563  [pdf, other

    cs.DB cs.AI cs.CR cs.LG

    Time Series Synthesis Using the Matrix Profile for Anonymization

    Authors: Audrey Der, Chin-Chia Michael Yeh, Yan Zheng, Junpeng Wang, Huiyuan Chen, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn Keogh

    Abstract: Publishing and sharing data is crucial for the data mining community, allowing collaboration and driving open innovation. However, many researchers cannot release their data due to privacy regulations or fear of leaking confidential business information. To alleviate such issues, we propose the Time Series Synthesis Using the Matrix Profile (TSSUMP) method, where synthesized time series can be rel… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  6. arXiv:2311.02561  [pdf, other

    cs.LG cs.AI

    Ego-Network Transformer for Subsequence Classification in Time Series Data

    Authors: Chin-Chia Michael Yeh, Huiyuan Chen, Yujie Fan, Xin Dai, Yan Zheng, Vivian Lai, Junpeng Wang, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn Keogh

    Abstract: Time series classification is a widely studied problem in the field of time series data mining. Previous research has predominantly focused on scenarios where relevant or foreground subsequences have already been extracted, with each subsequence corresponding to a single label. However, real-world time series data often contain foreground subsequences that are intertwined with background subsequen… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  7. arXiv:2311.02560  [pdf, other

    cs.IR cs.LG

    Temporal Treasure Hunt: Content-based Time Series Retrieval System for Discovering Insights

    Authors: Chin-Chia Michael Yeh, Huiyuan Chen, Xin Dai, Yan Zheng, Yujie Fan, Vivian Lai, Junpeng Wang, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei Zhang

    Abstract: Time series data is ubiquitous across various domains such as finance, healthcare, and manufacturing, but their properties can vary significantly depending on the domain they originate from. The ability to perform Content-based Time Series Retrieval (CTSR) is crucial for identifying unknown time series examples. However, existing CTSR works typically focus on retrieving time series from a single d… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  8. FATA-Trans: Field And Time-Aware Transformer for Sequential Tabular Data

    Authors: Dongyu Zhang, Liang Wang, Xin Dai, Shubham Jain, Junpeng Wang, Yujie Fan, Chin-Chia Michael Yeh, Yan Zheng, Zhongfang Zhuang, Wei Zhang

    Abstract: Sequential tabular data is one of the most commonly used data types in real-world applications. Different from conventional tabular data, where rows in a table are independent, sequential tabular data contains rich contextual and sequential information, where some fields are dynamically changing over time and others are static. Existing transformer-based approaches analyzing sequential tabular dat… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: This work is accepted by ACM International Conference on Information and Knowledge Management (CIKM) 2023

  9. arXiv:2310.03925  [pdf, other

    cs.LG cs.AI

    Multitask Learning for Time Series Data with 2D Convolution

    Authors: Chin-Chia Michael Yeh, Xin Dai, Yan Zheng, Junpeng Wang, Huiyuan Chen, Yujie Fan, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei Zhang

    Abstract: Multitask learning (MTL) aims to develop a unified model that can handle a set of closely related tasks simultaneously. By optimizing the model across multiple tasks, MTL generally surpasses its non-MTL counterparts in terms of generalizability. Although MTL has been extensively researched in various domains such as computer vision, natural language processing, and recommendation systems, its appl… ▽ More

    Submitted 10 October, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

  10. arXiv:2310.03919  [pdf, other

    cs.IR cs.AI cs.LG

    An Efficient Content-based Time Series Retrieval System

    Authors: Chin-Chia Michael Yeh, Huiyuan Chen, Xin Dai, Yan Zheng, Junpeng Wang, Vivian Lai, Yujie Fan, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei Zhang, Jeff M. Phillips

    Abstract: A Content-based Time Series Retrieval (CTSR) system is an information retrieval system for users to interact with time series emerged from multiple domains, such as finance, healthcare, and manufacturing. For example, users seeking to learn more about the source of a time series can submit the time series as a query to the CTSR system and retrieve a list of relevant time series with associated met… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  11. arXiv:2310.03916  [pdf, other

    cs.LG cs.AI

    Toward a Foundation Model for Time Series Data

    Authors: Chin-Chia Michael Yeh, Xin Dai, Huiyuan Chen, Yan Zheng, Yujie Fan, Audrey Der, Vivian Lai, Zhongfang Zhuang, Junpeng Wang, Liang Wang, Wei Zhang

    Abstract: A foundation model is a machine learning model trained on a large and diverse set of data, typically using self-supervised learning-based pre-training techniques, that can be adapted to various downstream tasks. However, current research on time series pre-training has mostly focused on models pre-trained solely on data from a single domain, resulting in a lack of knowledge about other types of ti… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  12. arXiv:2309.15169  [pdf, other

    cs.LG cs.AI

    Revealing the Power of Spatial-Temporal Masked Autoencoders in Multivariate Time Series Forecasting

    Authors: Jiarui Sun, Yujie Fan, Chin-Chia Michael Yeh, Wei Zhang, Girish Chowdhary

    Abstract: Multivariate time series (MTS) forecasting involves predicting future time series data based on historical observations. Existing research primarily emphasizes the development of complex spatial-temporal models that capture spatial dependencies and temporal correlations among time series variables explicitly. However, recent advances have been impeded by challenges relating to data scarcity and mo… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  13. Hessian-aware Quantized Node Embeddings for Recommendation

    Authors: Huiyuan Chen, Kaixiong Zhou, Kwei-Herng Lai, Chin-Chia Michael Yeh, Yan Zheng, Xia Hu, Hao Yang

    Abstract: Graph Neural Networks (GNNs) have achieved state-of-the-art performance in recommender systems. Nevertheless, the process of searching and ranking from a large item corpus usually requires high latency, which limits the widespread deployment of GNNs in industry-scale applications. To address this issue, many methods compress user/item representations into the binary embedding space to reduce space… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

  14. Adversarial Collaborative Filtering for Free

    Authors: Huiyuan Chen, Xiaoting Li, Vivian Lai, Chin-Chia Michael Yeh, Yujie Fan, Yan Zheng, Mahashweta Das, Hao Yang

    Abstract: Collaborative Filtering (CF) has been successfully used to help users discover the items of interest. Nevertheless, existing CF methods suffer from noisy data issue, which negatively impacts the quality of recommendation. To tackle this problem, many prior studies leverage adversarial learning to regularize the representations of users/items, which improves both generalizability and robustness. Th… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

  15. Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation

    Authors: Vivian Lai, Huiyuan Chen, Chin-Chia Michael Yeh, Minghua Xu, Yiwei Cai, Hao Yang

    Abstract: Transformer and its variants are a powerful class of architectures for sequential recommendation, owing to their ability of capturing a user's dynamic interests from their past interactions. Despite their success, Transformer-based models often require the optimization of a large number of parameters, making them difficult to train from sparse data in sequential recommendation. To address the prob… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

  16. EmbeddingTree: Hierarchical Exploration of Entity Features in Embedding

    Authors: Yan Zheng, Junpeng Wang, Chin-Chia Michael Yeh, Yujie Fan, Huiyuan Chen, Liang Wang, Wei Zhang

    Abstract: Embedding learning transforms discrete data entities into continuous numerical representations, encoding features/properties of the entities. Despite the outstanding performance reported from different embedding learning algorithms, few efforts were devoted to structurally interpreting how features are encoded in the learned embedding space. This work proposes EmbeddingTree, a hierarchical embeddi… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: 5 pages, 3 figures, accepted by PacificVis 2023

  17. arXiv:2307.08910  [pdf, other

    cs.LG cs.IR

    Sharpness-Aware Graph Collaborative Filtering

    Authors: Huiyuan Chen, Chin-Chia Michael Yeh, Yujie Fan, Yan Zheng, Junpeng Wang, Vivian Lai, Mahashweta Das, Hao Yang

    Abstract: Graph Neural Networks (GNNs) have achieved impressive performance in collaborative filtering. However, GNNs tend to yield inferior performance when the distributions of training and test data are not aligned well. Also, training GNNs requires optimizing non-convex neural networks with an abundance of local and global minima, which may differ widely in their performance at test time. Thus, it is es… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  18. arXiv:2306.01913  [pdf, other

    cs.AI

    PDT: Pretrained Dual Transformers for Time-aware Bipartite Graphs

    Authors: Xin Dai, Yujie Fan, Zhongfang Zhuang, Shubham Jain, Chin-Chia Michael Yeh, Junpeng Wang, Liang Wang, Yan Zheng, Prince Osei Aboagye, Wei Zhang

    Abstract: Pre-training on large models is prevalent and emerging with the ever-growing user-generated content in many machine learning application categories. It has been recognized that learning contextual knowledge from the datasets depicting user-content interaction plays a vital role in downstream tasks. Despite several studies attempting to learn contextual knowledge via pre-training methods, finding a… ▽ More

    Submitted 25 September, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

  19. arXiv:2303.13731  [pdf, other

    cs.LG cs.CV cs.HC

    How Does Attention Work in Vision Transformers? A Visual Analytics Attempt

    Authors: Yiran Li, Junpeng Wang, Xin Dai, Liang Wang, Chin-Chia Michael Yeh, Yan Zheng, Wei Zhang, Kwan-Liu Ma

    Abstract: Vision transformer (ViT) expands the success of transformer models from sequential data to images. The model decomposes an image into many smaller patches and arranges them into a sequence. Multi-head self-attentions are then applied to the sequence to learn the attention between patches. Despite many successful interpretations of transformers on sequential data, little effort has been devoted to… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted by PacificVis 2023 and selected to be published in TVCG

  20. arXiv:2212.06146  [pdf

    cs.LG cs.AI

    Matrix Profile XXVII: A Novel Distance Measure for Comparing Long Time Series

    Authors: Audrey Der, Chin-Chia Michael Yeh, Renjie Wu, Junpeng Wang, Yan Zheng, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn Keogh

    Abstract: The most useful data mining primitives are distance measures. With an effective distance measure, it is possible to perform classification, clustering, anomaly detection, segmentation, etc. For single-event time series Euclidean Distance and Dynamic Time War** distance are known to be extremely effective. However, for time series containing cyclical behaviors, the semantic meaningfulness of such… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

    Comments: Accepted at IEEE ICKG 2022. (Previously entitled IEEE ICBK.) Abridged abstract as per arxiv's requirements

  21. TinyKG: Memory-Efficient Training Framework for Knowledge Graph Neural Recommender Systems

    Authors: Huiyuan Chen, Xiaoting Li, Kaixiong Zhou, Xia Hu, Chin-Chia Michael Yeh, Yan Zheng, Hao Yang

    Abstract: There has been an explosion of interest in designing various Knowledge Graph Neural Networks (KGNNs), which achieve state-of-the-art performance and provide great explainability for recommendation. The promising performance is mainly resulting from their capability of capturing high-order proximity messages over the knowledge graphs. However, training KGNNs at scale is challenging due to the high… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  22. Denoising Self-attentive Sequential Recommendation

    Authors: Huiyuan Chen, Yusan Lin, Menghai Pan, Lan Wang, Chin-Chia Michael Yeh, Xiaoting Li, Yan Zheng, Fei Wang, Hao Yang

    Abstract: Transformer-based sequential recommenders are very powerful for capturing both short-term and long-term sequential item dependencies. This is mainly attributed to their unique self-attention networks to exploit pairwise item-item interactions within the sequence. However, real-world item sequences are often noisy, which is particularly true for implicit feedback. For example, a large portion of cl… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  23. arXiv:2212.03449  [pdf, other

    cs.LG cs.SI

    Dynamic Graph Node Classification via Time Augmentation

    Authors: Jiarui Sun, Mengting Gu, Chin-Chia Michael Yeh, Yujie Fan, Girish Chowdhary, Wei Zhang

    Abstract: Node classification for graph-structured data aims to classify nodes whose labels are unknown. While studies on static graphs are prevalent, few studies have focused on dynamic graph node classification. Node classification on dynamic graphs is challenging for two reasons. First, the model needs to capture both structural and temporal information, particularly on dynamic graphs with a long history… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Accepted by IEEE Big Data 2022

  24. arXiv:2208.05648  [pdf, other

    cs.LG cs.AI cs.DB

    Embedding Compression with Hashing for Efficient Representation Learning in Large-Scale Graph

    Authors: Chin-Chia Michael Yeh, Mengting Gu, Yan Zheng, Huiyuan Chen, Javid Ebrahimi, Zhongfang Zhuang, Junpeng Wang, Liang Wang, Wei Zhang

    Abstract: Graph neural networks (GNNs) are deep learning models designed specifically for graph data, and they typically rely on node features as the input to the first layer. When applying such a type of network on the graph without node features, one can extract simple graph-based node features (e.g., number of degrees) or learn the input node representations (i.e., embeddings) when training the network.… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

    Comments: 11 pages

  25. arXiv:2201.07849  [pdf, other

    cs.LG cs.HC

    Learning-From-Disagreement: A Model Comparison and Visual Analytics Framework

    Authors: Junpeng Wang, Liang Wang, Yan Zheng, Chin-Chia Michael Yeh, Shubham Jain, Wei Zhang

    Abstract: With the fast-growing number of classification models being produced every day, numerous model interpretation and comparison solutions have also been introduced. For example, LIME and SHAP can interpret what input features contribute more to a classifier's output predictions. Different numerical metrics (e.g., accuracy) can be used to easily compare two classifiers. However, few works can interpre… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

    Comments: 16 pages,13 figures

  26. arXiv:2112.12965  [pdf, other

    cs.DB cs.LG

    Error-bounded Approximate Time Series Joins Using Compact Dictionary Representations of Time Series

    Authors: Chin-Chia Michael Yeh, Yan Zheng, Junpeng Wang, Huiyuan Chen, Zhongfang Zhuang, Wei Zhang, Eamonn Keogh

    Abstract: The matrix profile is an effective data mining tool that provides similarity join functionality for time series data. Users of the matrix profile can either join a time series with itself using intra-similarity join (i.e., self-join) or join a time series with another time series using inter-similarity join. By invoking either or both types of joins, the matrix profile can help users discover both… ▽ More

    Submitted 5 November, 2023; v1 submitted 24 December, 2021; originally announced December 2021.

  27. Online Multi-horizon Transaction Metric Estimation with Multi-modal Learning in Payment Networks

    Authors: Chin-Chia Michael Yeh, Zhongfang Zhuang, Junpeng Wang, Yan Zheng, Javid Ebrahimi, Ryan Mercer, Liang Wang, Wei Zhang

    Abstract: Predicting metrics associated with entities' transnational behavior within payment processing networks is essential for system monitoring. Multivariate time series, aggregated from the past transaction history, can provide valuable insights for such prediction. The general multivariate time series prediction problem has been well studied and applied across several domains, including manufacturing,… ▽ More

    Submitted 22 September, 2021; v1 submitted 21 September, 2021; originally announced September 2021.

    Comments: 10 pages

  28. arXiv:2104.02797  [pdf, other

    cs.CL cs.HC

    VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations

    Authors: Archit Rathore, Sunipa Dev, Jeff M. Phillips, Vivek Srikumar, Yan Zheng, Chin-Chia Michael Yeh, Junpeng Wang, Wei Zhang, Bei Wang

    Abstract: Word vector embeddings have been shown to contain and amplify biases in data they are extracted from. Consequently, many techniques have been proposed to identify, mitigate, and attenuate these biases in word representations. In this paper, we utilize interactive visualization to increase the interpretability and accessibility of a collection of state-of-the-art debiasing techniques. To aid this,… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: 11 pages

  29. arXiv:2011.02602  [pdf, other

    cs.LG cs.AI cs.IR

    Merchant Category Identification Using Credit Card Transactions

    Authors: Chin-Chia Michael Yeh, Zhongfang Zhuang, Yan Zheng, Liang Wang, Junpeng Wang, Wei Zhang

    Abstract: Digital payment volume has proliferated in recent years with the rapid growth of small businesses and online shops. When processing these digital transactions, recognizing each merchant's real identity (i.e., business type) is vital to ensure the integrity of payment processing systems. Conventionally, this problem is formulated as a time series classification problem solely using the merchant tra… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: 9 pages

  30. arXiv:2009.10989  [pdf, other

    cs.LG cs.AI cs.DB cs.IR stat.ML

    Towards a Flexible Embedding Learning Framework

    Authors: Chin-Chia Michael Yeh, Dhruv Gelda, Zhongfang Zhuang, Yan Zheng, Liang Gou, Wei Zhang

    Abstract: Representation learning is a fundamental building block for analyzing entities in a database. While the existing embedding learning methods are effective in various data mining problems, their applicability is often limited because these methods have pre-determined assumptions on the type of semantics captured by the learned embeddings, and the assumptions may not well align with specific downstre… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

    Comments: 10 pages

  31. arXiv:2008.01670  [pdf, other

    q-fin.ST cs.LG stat.ML

    Multi-stream RNN for Merchant Transaction Prediction

    Authors: Zhongfang Zhuang, Chin-Chia Michael Yeh, Liang Wang, Wei Zhang, Junpeng Wang

    Abstract: Recently, digital payment systems have significantly changed people's lifestyles. New challenges have surfaced in monitoring and guaranteeing the integrity of payment processing systems. One important task is to predict the future transaction statistics of each merchant. These predictions can thus be used to steer other tasks, ranging from fraud detection to recommendation. This problem is challen… ▽ More

    Submitted 24 July, 2020; originally announced August 2020.

    Comments: Accepted by KDD 2020 Workshop on Machine Learning in Finance

  32. arXiv:2007.05303  [pdf, other

    cs.LG cs.AI stat.ML

    Multi-future Merchant Transaction Prediction

    Authors: Chin-Chia Michael Yeh, Zhongfang Zhuang, Wei Zhang, Liang Wang

    Abstract: The multivariate time series generated from merchant transaction history can provide critical insights for payment processing companies. The capability of predicting merchants' future is crucial for fraud detection and recommendation systems. Conventionally, this problem is formulated to predict one multivariate time series under the multi-horizon setting. However, real-world applications often re… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

  33. arXiv:1910.05862  [pdf, other

    cs.LG stat.ML

    Constrained Non-Affine Alignment of Embeddings

    Authors: Yuwei Wang, Yan Zheng, Yanqing Peng, Chin-Chia Michael Yeh, Zhongfang Zhuang, Das Mahashweta, Bendre Mangesh, Feifei Li, Wei Zhang, Jeff M. Phillips

    Abstract: Embeddings are one of the fundamental building blocks for data analysis tasks. Embeddings are already essential tools for large language models and image analysis, and their use is being extended to many other research domains. The generation of these distributed representations is often a data- and computation-expensive process; yet the holistic analysis and adjustment of them after they have bee… ▽ More

    Submitted 19 November, 2021; v1 submitted 13 October, 2019; originally announced October 2019.

  34. arXiv:1811.03149  [pdf

    cs.LG stat.ML

    Time Series Classification to Improve Poultry Welfare

    Authors: Alireza Abdoli, Amy C. Murillo, Chin-Chia M. Yeh, Alec C. Gerry, Eamonn J. Keogh

    Abstract: Poultry farms are an important contributor to the human food chain. Worldwide, humankind keeps an enormous number of domesticated birds (e.g. chickens) for their eggs and their meat, providing rich sources of low-fat protein. However, around the world, there have been growing concerns about the quality of life for the livestock in poultry farms; and increasingly vocal demands for improved standard… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

  35. arXiv:1811.03064  [pdf, other

    cs.LG cs.AI stat.ML

    Towards a Near Universal Time Series Data Mining Tool: Introducing the Matrix Profile

    Authors: Chin-Chia Michael Yeh

    Abstract: The last decade has seen a flurry of research on all-pairs-similarity-search (or, self-join) for text, DNA, and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. Surprisingly, however, little progress has been made on addressing this problem for time series subsequences. In this thesis, we have introduced a near universal time series data minin… ▽ More

    Submitted 11 July, 2020; v1 submitted 5 November, 2018; originally announced November 2018.

    Comments: PhD dissertation (2018)

  36. arXiv:1811.01557  [pdf, other

    cs.LG cs.AI stat.ML

    Representation Learning by Reconstructing Neighborhoods

    Authors: Chin-Chia Michael Yeh, Yan Zhu, Evangelos E. Papalexakis, Abdullah Mueen, Eamonn Keogh

    Abstract: Since its introduction, unsupervised representation learning has attracted a lot of attention from the research community, as it is demonstrated to be highly effective and easy-to-apply in tasks such as dimension reduction, clustering, visualization, information retrieval, and semi-supervised learning. In this work, we propose a novel unsupervised representation learning framework called neighbor-… ▽ More

    Submitted 6 November, 2018; v1 submitted 5 November, 2018; originally announced November 2018.

  37. arXiv:1810.07758  [pdf, other

    cs.LG stat.ML

    The UCR Time Series Archive

    Authors: Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, Eamonn Keogh

    Abstract: The UCR Time Series Archive - introduced in 2002, has become an important resource in the time series data mining community, with at least one thousand published papers making use of at least one data set from the archive. The original incarnation of the archive had sixteen data sets but since that time, it has gone through periodic expansions. The last expansion took place in the summer of 2015 w… ▽ More

    Submitted 8 September, 2019; v1 submitted 17 October, 2018; originally announced October 2018.