-
Interactive slice visualization for exploring machine learning models
Authors:
Catherine B. Hurley,
Mark O'Connell,
Katarina Domijan
Abstract:
Machine learning models fit complex algorithms to arbitrarily large datasets. These algorithms are well-known to be high on performance and low on interpretability. We use interactive visualization of slices of predictor space to address the interpretability deficit; in effect opening up the black-box of machine learning algorithms, for the purpose of interrogating, explaining, validating and comp…
▽ More
Machine learning models fit complex algorithms to arbitrarily large datasets. These algorithms are well-known to be high on performance and low on interpretability. We use interactive visualization of slices of predictor space to address the interpretability deficit; in effect opening up the black-box of machine learning algorithms, for the purpose of interrogating, explaining, validating and comparing model fits. Slices are specified directly through interaction, or using various touring algorithms designed to visit high-occupancy sections or regions where the model fits have interesting properties. The methods presented here are implemented in the R package \pkg{condvis2}.
△ Less
Submitted 7 September, 2021; v1 submitted 18 January, 2021;
originally announced January 2021.
-
Non-separable matrix builders for signal processing, quantum information and MIMO applications
Authors:
Ted Hurley,
Barry Hurley
Abstract:
Matrices are built and designed by applying procedures from lower order matrices. Matrix tensor products, direct sums or multiplication of matrices are such procedures and a matrix built from these is said to be a {\em separable} matrix. A {\em non-separable} matrix is a matrix which is not separable and is often referred to as {\em an entangled matrix}. The matrices built may retain properties of…
▽ More
Matrices are built and designed by applying procedures from lower order matrices. Matrix tensor products, direct sums or multiplication of matrices are such procedures and a matrix built from these is said to be a {\em separable} matrix. A {\em non-separable} matrix is a matrix which is not separable and is often referred to as {\em an entangled matrix}. The matrices built may retain properties of the lower order matrices or may also acquire new desired properties not inherent in the constituents.
Here design methods for non-separable matrices of required types are derived. These can retain properties of lower order matrices or have new desirable properties. Infinite series of required non-separable matrices are constructible by the general methods.
Non-separable matrices are required for applications and other uses; they can capture the structure in a unique way and thus perform much better than separable matrices. General new methods are developed with which to construct {\em multidimensional entangled paraunitary matrices}; these have applications for wavelet and filter bank design. The constructions are in addition used to design new systems of non-separable unitary matrices; these have applications in quantum information theory. Some consequences include the design of full diversity constellations of unitary matrices, which are used in MIMO systems, and methods to design infinite series of special types of Hadamard matrices.
△ Less
Submitted 15 August, 2023; v1 submitted 3 January, 2021;
originally announced January 2021.
-
Using massive health insurance claims data to predict very high-cost claimants: a machine learning approach
Authors:
José M. Maisog,
Wenhong Li,
Yanchun Xu,
Brian Hurley,
Hetal Shah,
Ryan Lemberg,
Tina Borden,
Stephen Bandeian,
Melissa Schline,
Roxanna Cross,
Alan Spiro,
Russ Michael,
Alexander Gutfraind
Abstract:
Due to escalating healthcare costs, accurately predicting which patients will incur high costs is an important task for payers and providers of healthcare. High-cost claimants (HiCCs) are patients who have annual costs above $\$250,000…
▽ More
Due to escalating healthcare costs, accurately predicting which patients will incur high costs is an important task for payers and providers of healthcare. High-cost claimants (HiCCs) are patients who have annual costs above $\$250,000$ and who represent just 0.16% of the insured population but currently account for 9% of all healthcare costs. In this study, we aimed to develop a high-performance algorithm to predict HiCCs to inform a novel care management system. Using health insurance claims from 48 million people and augmented with census data, we applied machine learning to train binary classification models to calculate the personal risk of HiCC. To train the models, we developed a platform starting with 6,006 variables across all clinical and demographic dimensions and constructed over one hundred candidate models. The best model achieved an area under the receiver operating characteristic curve of 91.2%. The model exceeds the highest published performance (84%) and remains high for patients with no prior history of high-cost status (89%), who have less than a full year of enrollment (87%), or lack pharmacy claims data (88%). It attains an area under the precision-recall curve of 23.1%, and precision of 74% at a threshold of 0.99. A care management program enrolling 500 people with the highest HiCC risk is expected to treat 199 true HiCCs and generate a net savings of $\$7.3$ million per year. Our results demonstrate that high-performing predictive models can be constructed using claims data and publicly available data alone, even for rare high-cost claimants exceeding $\$250,000$. Our model demonstrates the transformational power of machine learning and artificial intelligence in care management, which would allow healthcare payers and providers to introduce the next generation of care management programs.
△ Less
Submitted 30 December, 2019;
originally announced December 2019.
-
Maximum distance separable codes to order
Authors:
Ted Hurley,
Donny Hurley,
Barry Hurley
Abstract:
Maximum distance separable (MDS) are constructed to required specifications. The codes are explicitly given over finite fields with efficient encoding and decoding algorithms. Series of such codes over finite fields with ratio of distance to length approaching $(1-R)$ for given $R, \, 0 < R < 1$ are derived. For given rate $R=\frac{r}{n}$, with $p$ not dividing $n$, series of codes over finite fie…
▽ More
Maximum distance separable (MDS) are constructed to required specifications. The codes are explicitly given over finite fields with efficient encoding and decoding algorithms. Series of such codes over finite fields with ratio of distance to length approaching $(1-R)$ for given $R, \, 0 < R < 1$ are derived. For given rate $R=\frac{r}{n}$, with $p$ not dividing $n$, series of codes over finite fields of characteristic $p$ are constructed such that the ratio of the distance to the length approaches $(1-R)$. For a given field $GF(q)$ MDS codes of the form $(q-1,r)$ are constructed for any $r$.
The codes are encompassing, easy to construct with efficient encoding and decoding algorithms of complexity $\max\{O(n\log n), t^2\}$, where $t$ is the error-correcting capability of the code.
△ Less
Submitted 18 February, 2019;
originally announced February 2019.
-
Entanglement-assisted quantum error-correcting codes from units
Authors:
Ted Hurley,
Donny Hurley,
Barry Hurley
Abstract:
Entanglement-assisted quantum error-correcting codes (EAQECCs) to desired rate, error-correcting capability and maximum shared entanglement are constructed. Thus for a required rate $R$, required error-correcting capability to correct $t$ errors, mds (maximum distance separable) EAQECCs of the form $[[n,r,d;c]]$ with $R=\frac{r}{n}, d\geq (2t+1), c = (n-r), d= (n-r+1)$ are constructed. Series of s…
▽ More
Entanglement-assisted quantum error-correcting codes (EAQECCs) to desired rate, error-correcting capability and maximum shared entanglement are constructed. Thus for a required rate $R$, required error-correcting capability to correct $t$ errors, mds (maximum distance separable) EAQECCs of the form $[[n,r,d;c]]$ with $R=\frac{r}{n}, d\geq (2t+1), c = (n-r), d= (n-r+1)$ are constructed. Series of such codes may be constructed where the rate and the relative distance approach non-zero constants as $n$ approaches infinity. The codes may also be constructed over prime order fields in which modular arithmetic may be employed.
△ Less
Submitted 28 June, 2018;
originally announced June 2018.
-
Quantum error-correcting codes: the unit-derived strategy
Authors:
Ted Hurley,
Donny Hurley,
Barry Hurley
Abstract:
Series of maximum distance quantum error-correcting codes are developed and analysed. For a given rate and given error-correction capability, quantum error-correcting codes with these specifications are constructed. The codes are explicit with efficient decoding algorithms. For a given field maximum length quantum codes are constructed.
Series of maximum distance quantum error-correcting codes are developed and analysed. For a given rate and given error-correction capability, quantum error-correcting codes with these specifications are constructed. The codes are explicit with efficient decoding algorithms. For a given field maximum length quantum codes are constructed.
△ Less
Submitted 3 August, 2018; v1 submitted 23 May, 2018;
originally announced May 2018.
-
Conditional Visualization for Statistical Models: An Introduction to the condvis Package in R
Authors:
Mark O'Connell,
Catherine B. Hurley,
Katarina Domijan
Abstract:
The condvis package is for interactive visualization of sections in data space, showing fitted models on the section, and observed data near the section. The primary goal is the interpretation of complex models, and showing how the observed data support the fitted model. There is a video accompaniment to this paper available at https://www.youtube.com/watch?v=rKFq7xwgdX0. This is a preprint versio…
▽ More
The condvis package is for interactive visualization of sections in data space, showing fitted models on the section, and observed data near the section. The primary goal is the interpretation of complex models, and showing how the observed data support the fitted model. There is a video accompaniment to this paper available at https://www.youtube.com/watch?v=rKFq7xwgdX0. This is a preprint version of an article to appear in the Journal of Statistical Software.
△ Less
Submitted 2 October, 2016;
originally announced October 2016.
-
Elastic Solver: Balancing Solution Time and Energy Consumption
Authors:
Barry Hurley,
Deepak Mehta,
Barry O'Sullivan
Abstract:
Combinatorial decision problems arise in many different domains such as scheduling, routing, packing, bioinformatics, and many more. Despite recent advances in develo** scalable solvers, there are still many problems which are often very hard to solve. Typically the most advanced solvers include elements which are stochastic in nature. If a same instance is solved many times using different seed…
▽ More
Combinatorial decision problems arise in many different domains such as scheduling, routing, packing, bioinformatics, and many more. Despite recent advances in develo** scalable solvers, there are still many problems which are often very hard to solve. Typically the most advanced solvers include elements which are stochastic in nature. If a same instance is solved many times using different seeds then depending on the inherent characteristics of a problem instance and the solver, one can observe a highly-variant distribution of times spanning multiple orders of magnitude. Therefore, to solve a problem instance efficiently it is often useful to solve the same instance in parallel with different seeds. With the proliferation of cloud computing, it is natural to think about an elastic solver which can scale up by launching searches in parallel on thousands of machines (or cores). However, this could result in consuming a lot of energy. Moreover, not every instance would require thousands of machines. The challenge is to resolve the tradeoff between solution time and energy consumption optimally for a given problem instance. We analyse the impact of the number of machines (or cores) on not only solution time but also on energy consumption. We highlight that although solution time always drops as the number of machines increases, the relation between the number of machines and energy consumption is more complicated. In many cases, the optimal energy consumption may be achieved by a middle ground, we analyse this relationship in detail. The tradeoff between solution time and energy consumption is studied further, showing that the energy consumption of a solver can be reduced drastically if we increase the solution time marginally. We also develop a prediction model, demonstrating that such insights can be exploited to achieve faster solutions times in a more energy efficient manor.
△ Less
Submitted 23 May, 2016;
originally announced May 2016.
-
Transformation-based Feature Computation for Algorithm Portfolios
Authors:
Barry Hurley,
Serdar Kadioglu,
Yuri Malitsky,
Barry O'Sullivan
Abstract:
Instance-specific algorithm configuration and algorithm portfolios have been shown to offer significant improvements over single algorithm approaches in a variety of application domains. In the SAT and CSP domains algorithm portfolios have consistently dominated the main competitions in these fields for the past five years. For a portfolio approach to be effective there are two crucial conditions…
▽ More
Instance-specific algorithm configuration and algorithm portfolios have been shown to offer significant improvements over single algorithm approaches in a variety of application domains. In the SAT and CSP domains algorithm portfolios have consistently dominated the main competitions in these fields for the past five years. For a portfolio approach to be effective there are two crucial conditions that must be met. First, there needs to be a collection of complementary solvers with which to make a portfolio. Second, there must be a collection of problem features that can accurately identify structural differences between instances. This paper focuses on the latter issue: feature representation, because, unlike SAT, not every problem has well-studied features. We employ the well-known SATzilla feature set, but compute alternative sets on different SAT encodings of CSPs. We show that regardless of what encoding is used to convert the instances, adequate structural information is maintained to differentiate between problem instances, and that this can be exploited to make an effective portfolio-based CSP solver.
△ Less
Submitted 10 January, 2014;
originally announced January 2014.
-
Proteus: A Hierarchical Portfolio of Solvers and Transformations
Authors:
Barry Hurley,
Lars Kotthoff,
Yuri Malitsky,
Barry O'Sullivan
Abstract:
In recent years, portfolio approaches to solving SAT problems and CSPs have become increasingly common. There are also a number of different encodings for representing CSPs as SAT instances. In this paper, we leverage advances in both SAT and CSP solving to present a novel hierarchical portfolio-based approach to CSP solving, which we call Proteus, that does not rely purely on CSP solvers. Instead…
▽ More
In recent years, portfolio approaches to solving SAT problems and CSPs have become increasingly common. There are also a number of different encodings for representing CSPs as SAT instances. In this paper, we leverage advances in both SAT and CSP solving to present a novel hierarchical portfolio-based approach to CSP solving, which we call Proteus, that does not rely purely on CSP solvers. Instead, it may decide that it is best to encode a CSP problem instance into SAT, selecting an appropriate encoding and a corresponding SAT solver. Our experimental evaluation used an instance of Proteus that involved four CSP solvers, three SAT encodings, and six SAT solvers, evaluated on the most challenging problem instances from the CSP solver competitions, involving global and intensional constraints. We show that significant performance improvements can be achieved by Proteus obtained by exploiting alternative view-points and solvers for combinatorial problem-solving.
△ Less
Submitted 17 February, 2014; v1 submitted 24 June, 2013;
originally announced June 2013.
-
Systems of MDS codes from units and idempotents
Authors:
Barry Hurley,
Ted Hurley
Abstract:
Algebraic systems are constructed from which series of maximum distance separable (mds) codes are derived. The methods use unit and idempotent schemes.
Algebraic systems are constructed from which series of maximum distance separable (mds) codes are derived. The methods use unit and idempotent schemes.
△ Less
Submitted 23 January, 2013;
originally announced January 2013.
-
Paraunitary Matrices
Authors:
Barry Hurley,
Ted Hurley
Abstract:
Design methods for paraunitary matrices from complete orthogonal sets of idempotents and related matrix structures are presented. These include techniques for designing non-separable multidimensional paraunitary matrices. Properties of the structures are obtained and proofs given. Paraunitary matrices play a central role in signal processing, in particular in the areas of filterbanks and wavelets.
Design methods for paraunitary matrices from complete orthogonal sets of idempotents and related matrix structures are presented. These include techniques for designing non-separable multidimensional paraunitary matrices. Properties of the structures are obtained and proofs given. Paraunitary matrices play a central role in signal processing, in particular in the areas of filterbanks and wavelets.
△ Less
Submitted 18 September, 2020; v1 submitted 3 May, 2012;
originally announced May 2012.
-
The Hadamard circulant conjecture
Authors:
Barry Hurley,
Paul Hurley,
Ted Hurley
Abstract:
It is shown that if $H$ is a circulant Hadamard $4n\ti 4n $ then $n=1$. This proves the Hadamard circulant conjecture.
It is shown that if $H$ is a circulant Hadamard $4n\ti 4n $ then $n=1$. This proves the Hadamard circulant conjecture.
△ Less
Submitted 4 September, 2011;
originally announced September 2011.
-
Group ring cryptography
Authors:
Barry Hurley,
Ted Hurley
Abstract:
Cryptographic systems are derived using units in group rings. Combinations of types of units in group rings give units not of any particular type. This includes cases of taking powers of units and products of such powers and adds the complexity of the {\em discrete logarithm} problem to the system.
The method enables encryption and (error-correcting) coding to be combined within one system. Thes…
▽ More
Cryptographic systems are derived using units in group rings. Combinations of types of units in group rings give units not of any particular type. This includes cases of taking powers of units and products of such powers and adds the complexity of the {\em discrete logarithm} problem to the system.
The method enables encryption and (error-correcting) coding to be combined within one system. These group ring cryptographic systems may be combined in a neat way with existing cryptographic systems, such as RSA, and a combination has the combined strength of both systems. Examples are given.
△ Less
Submitted 9 April, 2011;
originally announced April 2011.
-
New Rotation Periods in the Open Cluster NGC 1039 (M 34), and a Derivation of its Gyrochronology Age
Authors:
David J. James,
Sydney A. Barnes,
Soren Meibom,
Wesley Lockwood,
Stephen E. Levine,
Constantine Deliyannis,
Imants Platais,
Aaron Steinhauer,
Briana K. Hurley
Abstract:
Employing photometric rotation periods for solar-type stars in NGC 1039 [M 34], a young, nearby open cluster, we use its mass-dependent rotation period distribution to derive the cluster's age in a distance independent way, i.e., the so-called gyrochronology method. We present an analysis of 55 new rotation periods,using light curves derived from differential photometry, for solar type stars in M…
▽ More
Employing photometric rotation periods for solar-type stars in NGC 1039 [M 34], a young, nearby open cluster, we use its mass-dependent rotation period distribution to derive the cluster's age in a distance independent way, i.e., the so-called gyrochronology method. We present an analysis of 55 new rotation periods,using light curves derived from differential photometry, for solar type stars in M 34. We also exploit the results of a recently-completed, standardized, homogeneous BVIc CCD survey of the cluster in order to establish photometric cluster membership and assign B-V colours to each photometric variable. We describe a methodology for establishing the gyrochronology age for an ensemble of solar-type stars. Empirical relations between rotation period, photometric colour and stellar age (gyrochronology) are used to determine the age of M 34. Based on its position in a colour-period diagram, each M 34 member is designated as being either a solid-body rotator (interface or I-star), a differentially rotating star (convective or C-star) or an object which is in some transitory state in between the two (gap or g-star). Fitting the period and photometric colour of each I-sequence star in the cluster, we derive the cluster's mean gyrochronology age.
47/55 of the photometric variables lie along the loci of the cluster main sequence in V/B-V and V/V-I space. We are further able to confirm kinematic membership of the cluster for half of the periodic variables [21/55], employing results from an on-going radial velocity survey of the cluster. For each cluster member identified as an I-sequence object in the colour-period diagram, we derive its individual gyrochronology age, where the mean gyro age of M 34 is found to be 193 +/- 9 Myr, formally consistent (within the errors) with that derived using several distance-dependent, photometric isochrone methods (250 +/- 67 Myr).
△ Less
Submitted 31 March, 2010;
originally announced April 2010.