-
Dimension Reduction with Prior Information for Knowledge Discovery
Authors:
Anh Tuan Bui
Abstract:
This paper addresses the problem of map** high-dimensional data to a low-dimensional space, in the presence of other known features. This problem is ubiquitous in science and engineering as there are often controllable/measurable features in most applications. To solve this problem, this paper proposes a broad class of methods, which is referred to as conditional multidimensional scaling (MDS).…
▽ More
This paper addresses the problem of map** high-dimensional data to a low-dimensional space, in the presence of other known features. This problem is ubiquitous in science and engineering as there are often controllable/measurable features in most applications. To solve this problem, this paper proposes a broad class of methods, which is referred to as conditional multidimensional scaling (MDS). An algorithm for optimizing the objective function of conditional MDS is also developed. The convergence of this algorithm is proven under mild assumptions. Conditional MDS is illustrated with kinship terms, facial expressions, textile fabrics, car-brand perception, and cylinder machining examples. These examples demonstrate the advantages of conditional MDS over conventional dimension reduction in improving the estimation quality of the reduced-dimension space and simplifying visualization and knowledge discovery tasks. Computer codes for this work are available in the open-source cml R package.
△ Less
Submitted 29 December, 2023; v1 submitted 26 November, 2021;
originally announced November 2021.
-
Concept Drift Monitoring and Diagnostics of Supervised Learning Models via Score Vectors
Authors:
Kungang Zhang,
Anh T. Bui,
Daniel W. Apley
Abstract:
Supervised learning models are one of the most fundamental classes of models. Viewing supervised learning from a probabilistic perspective, the set of training data to which the model is fitted is usually assumed to follow a stationary distribution. However, this stationarity assumption is often violated in a phenomenon called concept drift, which refers to changes over time in the predictive rela…
▽ More
Supervised learning models are one of the most fundamental classes of models. Viewing supervised learning from a probabilistic perspective, the set of training data to which the model is fitted is usually assumed to follow a stationary distribution. However, this stationarity assumption is often violated in a phenomenon called concept drift, which refers to changes over time in the predictive relationship between covariates $\mathbf{X}$ and a response variable $Y$ and can render trained models suboptimal or obsolete. We develop a comprehensive and computationally efficient framework for detecting, monitoring, and diagnosing concept drift. Specifically, we monitor the Fisher score vector, defined as the gradient of the log-likelihood for the fitted model, using a form of multivariate exponentially weighted moving average, which monitors for general changes in the mean of a random vector. In spite of the substantial performance advantages that we demonstrate over popular error-based methods, a score-based approach has not been previously considered for concept drift monitoring. Advantages of the proposed score-based framework include applicability to any parametric model, more powerful detection of changes as shown in theory and experiments, and inherent diagnostic capabilities for hel** to identify the nature of the changes.
△ Less
Submitted 12 September, 2022; v1 submitted 12 December, 2020;
originally announced December 2020.
-
A monitoring and diagnostic approach for stochastic textured surfaces
Authors:
Anh Tuan Bui,
Daniel W. Apley
Abstract:
We develop a supervised-learning-based approach for monitoring and diagnosing texture-related defects in manufactured products characterized by stochastic textured surfaces that satisfy the locality and stationarity properties of Markov random fields. Examples of stochastic textured surface data include images of woven textiles; image or surface metrology data for machined, cast, or formed metal p…
▽ More
We develop a supervised-learning-based approach for monitoring and diagnosing texture-related defects in manufactured products characterized by stochastic textured surfaces that satisfy the locality and stationarity properties of Markov random fields. Examples of stochastic textured surface data include images of woven textiles; image or surface metrology data for machined, cast, or formed metal parts; microscopy images of material microstructure samples; etc. To characterize the complex spatial statistical dependencies of in-control samples of the stochastic textured surface, we use rather generic supervised learning methods, which provide an implicit characterization of the joint distribution of the surface texture. We propose two spatial moving statistics, which are computed from residual errors of the fitted supervised learning model, for monitoring and diagnosing local aberrations in the general spatial statistical behavior of newly manufactured stochastic textured surface samples in a statistical process control context. We illustrate the approach using images of textile fabric samples and simulated 2-D stochastic processes, for which the algorithm successfully detects local defects of various natures. Supplemental discussions, results, data and computer codes are available online.
△ Less
Submitted 21 July, 2017; v1 submitted 9 February, 2017;
originally announced February 2017.