-
What makes Models Compositional? A Theoretical View: With Supplement
Authors:
Parikshit Ram,
Tim Klinger,
Alexander G. Gray
Abstract:
Compositionality is thought to be a key component of language, and various compositional benchmarks have been developed to empirically probe the compositional generalization of existing sequence processing models. These benchmarks often highlight failures of existing models, but it is not clear why these models fail in this way. In this paper, we seek to theoretically understand the role the compo…
▽ More
Compositionality is thought to be a key component of language, and various compositional benchmarks have been developed to empirically probe the compositional generalization of existing sequence processing models. These benchmarks often highlight failures of existing models, but it is not clear why these models fail in this way. In this paper, we seek to theoretically understand the role the compositional structure of the models plays in these failures and how this structure relates to their expressivity and sample complexity. We propose a general neuro-symbolic definition of compositional functions and their compositional complexity. We then show how various existing general and special purpose sequence processing models (such as recurrent, convolution and attention-based ones) fit this definition and use it to analyze their compositional complexity. Finally, we provide theoretical guarantees for the expressivity and systematic generalization of compositional models that explicitly depend on our proposed definition and highlighting factors which drive poor empirical performance.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Debugging Trait Errors as Logic Programs
Authors:
Gavin Gray,
Will Crichton
Abstract:
Rust uses traits to define units of shared behavior. Trait constraints build up an implicit set of first-order hereditary Harrop clauses which is executed by a powerful logic programming engine in the trait system. But that power comes at a cost: the number of traits in Rust libraries is increasing, which puts a growing burden on the trait system to help programmers diagnose errors. Beyond a certa…
▽ More
Rust uses traits to define units of shared behavior. Trait constraints build up an implicit set of first-order hereditary Harrop clauses which is executed by a powerful logic programming engine in the trait system. But that power comes at a cost: the number of traits in Rust libraries is increasing, which puts a growing burden on the trait system to help programmers diagnose errors. Beyond a certain size of trait constraints, compiler diagnostics fall off the edge of a complexity cliff, leading to useless error messages. Crate maintainers have created ad-hoc solutions to diagnose common domain-specific errors, but the problem of diagnosing trait errors in general is still open. We propose a trait debugger as a means of getting developers the information necessary to diagnose trait errors in any domain and at any scale. Our proposed tool will extract proof trees from the trait solver, and it will interactively visualize these proof trees to facilitate debugging of trait errors.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
A Grounded Conceptual Model for Ownership Types in Rust
Authors:
Will Crichton,
Gavin Gray,
Shriram Krishnamurthi
Abstract:
Programmers learning Rust struggle to understand ownership types, Rust's core mechanism for ensuring memory safety without garbage collection. This paper describes our attempt to systematically design a pedagogy for ownership types. First, we studied Rust developers' misconceptions of ownership to create the Ownership Inventory, a new instrument for measuring a person's knowledge of ownership. We…
▽ More
Programmers learning Rust struggle to understand ownership types, Rust's core mechanism for ensuring memory safety without garbage collection. This paper describes our attempt to systematically design a pedagogy for ownership types. First, we studied Rust developers' misconceptions of ownership to create the Ownership Inventory, a new instrument for measuring a person's knowledge of ownership. We found that Rust learners could not connect Rust's static and dynamic semantics, such as determining why an ill-typed program would (or would not) exhibit undefined behavior. Second, we created a conceptual model of Rust's semantics that explains borrow checking in terms of flow-sensitive permissions on paths into memory. Third, we implemented a Rust compiler plugin that visualizes programs under the model. Fourth, we integrated the permissions model and visualizations into a broader pedagogy of ownership by writing a new ownership chapter for The Rust Programming Language, a popular Rust textbook. Fifth, we evaluated an initial deployment of our pedagogy against the original version, using reader responses to the Ownership Inventory as a point of comparison. Thus far, the new pedagogy has improved learner scores on the Ownership Inventory by an average of 9% ($N = 342, d = 0.56$).
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Investigating biomechanical determinants of endothelial permeability in a modified hollow fibre bioreactor
Authors:
Stephen G Gray,
Peter D Weinberg
Abstract:
Effects of mechanical stress on the permeability of vascular endothelium are important to normal physiology and may be critical in the development of atherosclerosis, where they can account for the patchy arterial distribution of the disease. Such properties are frequently investigated in vitro. Here we evaluate and use the hollow fibre bioreactor for this purpose; in this system, endothelial cell…
▽ More
Effects of mechanical stress on the permeability of vascular endothelium are important to normal physiology and may be critical in the development of atherosclerosis, where they can account for the patchy arterial distribution of the disease. Such properties are frequently investigated in vitro. Here we evaluate and use the hollow fibre bioreactor for this purpose; in this system, endothelial cells form a confluent monolayer lining numerous plastic capillaries with porous walls, contained in a cartridge. The capillaries were perfused with a near-aortic waveform by an external pump, and permeability was assessed by the movement of rhodamine-labelled albumin from the intracapillary space to the extracapillary space. Confluence and quiescence of the cells was confirmed by electron microscopy and measurements of glucose consumption and permeability. The system was able to detect previously established influences on permeability: tracer transport was increased by acute application of shear stress and decreased by chronic shear stress compared to a static control, and was increased by thrombin or an NO synthase inhibitor under chronic shear. Increasing viscosity by addition of xanthan gum reduced permeability under both acute and chronic shear. Addition of dam** chambers to reduce flow pulsatility increased permeability. Modifying the cartridge to allow chronic convection across the monolayer increased effective permeability more than could be explained the addition of convective transport alone, indicating that it caused an increase in permeability. The off-the-shelf hollow fibre bioreactor provides an excellent system for investigating the biomechanics of endothelial permeability and its potential is increased by simple modifications.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Toward Theoretical Guidance for Two Common Questions in Practical Cross-Validation based Hyperparameter Selection
Authors:
Parikshit Ram,
Alexander G. Gray,
Horst C. Samulowitz,
Gregory Bramble
Abstract:
We show, to our knowledge, the first theoretical treatments of two common questions in cross-validation based hyperparameter selection: (1) After selecting the best hyperparameter using a held-out set, we train the final model using {\em all} of the training data -- since this may or may not improve future generalization error, should one do this? (2) During optimization such as via SGD (stochasti…
▽ More
We show, to our knowledge, the first theoretical treatments of two common questions in cross-validation based hyperparameter selection: (1) After selecting the best hyperparameter using a held-out set, we train the final model using {\em all} of the training data -- since this may or may not improve future generalization error, should one do this? (2) During optimization such as via SGD (stochastic gradient descent), we must set the optimization tolerance $ρ$ -- since it trades off predictive accuracy with computation cost, how should one set it? Toward these problems, we introduce the {\em hold-in risk} (the error due to not using the whole training data), and the {\em model class mis-specification risk} (the error due to having chosen the wrong model class) in a theoretical view which is simple, general, and suggests heuristics that can be used when faced with a dataset instance. In proof-of-concept studies in synthetic data where theoretical quantities can be controlled, we show that these heuristics can, respectively, (1) always perform at least as well as always performing retraining or never performing retraining, (2) either improve performance or reduce computational overhead by $2\times$ with no loss in predictive performance.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
Deformation and dislocation evolution in body-centered-cubic single- and polycrystal tantalum
Authors:
Seunghyeon Lee,
Hansohl Cho,
Curt A. Bronkhorst,
Reeju Pokharel,
Donald W. Brown,
Bjørn Clausen,
Sven C. Vogel,
Veronica Anghel,
George T. Gray III,
Jason R. Mayeur
Abstract:
A physically-informed continuum crystal plasticity model is presented to elucidate the deformation mechanisms and dislocation evolution in body-centered-cubic (bcc) tantalum widely used as a key structural material for mechanical and thermal extremes. We show our unified structural modeling framework informed by mesoscopic dislocation dynamics simulations is capable of capturing salient features o…
▽ More
A physically-informed continuum crystal plasticity model is presented to elucidate the deformation mechanisms and dislocation evolution in body-centered-cubic (bcc) tantalum widely used as a key structural material for mechanical and thermal extremes. We show our unified structural modeling framework informed by mesoscopic dislocation dynamics simulations is capable of capturing salient features of the large inelastic behavior of tantalum at quasi-static (10$^{-3}$ s$^{-1}$) to extreme strain rates (5000 s$^{-1}$) and at room temperature and higher (873K) at both single- and polycrystal levels. We also present predictive capabilities of our model for microstructural evolution in the material. To this end, we investigate the effects of dislocation interactions on slip activities, instability and strain-hardening behavior at the single crystal level. Furthermore, ex situ measurements on crystallographic texture evolution and dislocation density growth are carried out for the polycrystal tantalum specimens at increasing strains. Numerical simulation results also support that our modeling framework is capable of capturing the main features of the polycrystal behavior over a wide range of strains, strain rates and temperatures. The theoretical, experimental and numerical results at both single- and polycrystal levels provide critical insight into the underlying physical pictures for micro- and macroscopic responses and their relations in this important class of refractory bcc materials undergoing severe inelastic deformations.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
Integral equation models for solvent in macromolecular crystals
Authors:
Jonathon G. Gray,
George M. Giambaşu,
David A. Case,
Tyler Luchko
Abstract:
Solvent can occupy up to ~70% of macromolecular crystals and hence having models that predict solvent distributions in periodic systems could improve in the interpretation of crystallographic data. Yet there are few implicit solvent models applicable to periodic solutes while crystallographic structures are commonly solved assuming a flat solvent model. Here we present a newly-developed periodic v…
▽ More
Solvent can occupy up to ~70% of macromolecular crystals and hence having models that predict solvent distributions in periodic systems could improve in the interpretation of crystallographic data. Yet there are few implicit solvent models applicable to periodic solutes while crystallographic structures are commonly solved assuming a flat solvent model. Here we present a newly-developed periodic version of the 3D-RISM integral equation method that is able to solve for efficiently and describe accurately water and ions distributions in periodic systems; the code can compute accurate gradients that can be used in minimizations or molecular dynamics simulations. The new method includes an extension of the OZ equation needed to yield charge neutrality for charged solutes which requires an additional contribution to the excess chemical potential that has not been previously identified; this is an important consideration for nucleic acids or any other charged system where most or all of the counter- and co-ions are part of the "disordered" solvent. We present of several calculations of protein, RNA and small molecule crystals to show that X-ray scattering intensities and solvent structure predicted by the periodic 3D-RISM solvent model are in closer agreement with experiment than are intensities computed using the default flat solvent model in the refmac5 or phenix refinement programs, with the greatest improvement in the 2 to 4 Å range. Prospects for incorporating integral equation models into crystallographic refinement are discussed.
△ Less
Submitted 7 September, 2021;
originally announced September 2021.
-
SeaNet -- Towards A Knowledge Graph Based Autonomic Management of Software Defined Networks
Authors:
Qianru Zhou,
Alasdair J. G. Gray,
Stephen McLaughlin
Abstract:
Automatic network management driven by Artificial Intelligent technologies has been heatedly discussed over decades. However, current reports mainly focus on theoretic proposals and architecture designs, works on practical implementations on real-life networks are yet to appear. This paper proposes our effort toward the implementation of knowledge graph driven approach for autonomic network manage…
▽ More
Automatic network management driven by Artificial Intelligent technologies has been heatedly discussed over decades. However, current reports mainly focus on theoretic proposals and architecture designs, works on practical implementations on real-life networks are yet to appear. This paper proposes our effort toward the implementation of knowledge graph driven approach for autonomic network management in software defined networks (SDNs), termed as SeaNet. Driven by the ToCo ontology, SeaNet is reprogrammed based on Mininet (a SDN emulator). It consists three core components, a knowledge graph generator, a SPARQL engine, and a network management API. The knowledge graph generator represents the knowledge in the telecommunication network management tasks into formally represented ontology driven model. Expert experience and network management rules can be formalized into knowledge graph and by automatically inferenced by SPARQL engine, Network management API is able to packet technology-specific details and expose technology-independent interfaces to users. The Experiments are carried out to evaluate proposed work by comparing with a commercial SDN controller Ryu implemented by the same language Python. The evaluation results show that SeaNet is considerably faster in most circumstances than Ryu and the SeaNet code is significantly more compact. Benefit from RDF reasoning, SeaNet is able to achieve O(1) time complexity on different scales of the knowledge graph while the traditional database can achieve O(nlogn) at its best. With the developed network management API, SeaNet enables researchers to develop semantic-intelligent applications on their own SDNs.
△ Less
Submitted 27 May, 2022; v1 submitted 24 June, 2021;
originally announced June 2021.
-
SARA -- A Semantic Access Point Resource Allocation Service for Heterogenous Wireless Networks
Authors:
Qianru Zhou,
Alasdair J. G. Gray,
Dimitrios Pezaros,
Stephen McLaughlin
Abstract:
In this paper, we present SARA, a Semantic Access point Resource Allocation service for heterogenous wireless networks with various wireless access technologies existing together. By automatically reasoning on the knowledge base of the full system provided by a knowledge based autonomic network management system -- SEANET, SARA selects the access point providing the best quality of service among t…
▽ More
In this paper, we present SARA, a Semantic Access point Resource Allocation service for heterogenous wireless networks with various wireless access technologies existing together. By automatically reasoning on the knowledge base of the full system provided by a knowledge based autonomic network management system -- SEANET, SARA selects the access point providing the best quality of service among the different access technologies. Based on an ontology assisted knowledge based system SEANET, SARA can also adapt the access point selection strategy according to customer defined rules automatically. Results of our evaluation based on emulated networks with hybrid access technologies and various scales show that SARA is able to improve the channel condition, in terms of throughput, evidently. Comparisons with current AP selection algorithms demonstrate that SARA outperforms the existing AP selection algorithms. The overhead in terms of time expense is reasonable and is shown to be faster than traditional access point selection approaches.
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
Solving Constrained CASH Problems with ADMM
Authors:
Parikshit Ram,
Sijia Liu,
Deepak Vijaykeerthi,
Dakuo Wang,
Djallel Bouneffouf,
Greg Bramble,
Horst Samulowitz,
Alexander G. Gray
Abstract:
The CASH problem has been widely studied in the context of automated configurations of machine learning (ML) pipelines and various solvers and toolkits are available. However, CASH solvers do not directly handle black-box constraints such as fairness, robustness or other domain-specific custom constraints. We present our recent approach [Liu, et al., 2020] that leverages the ADMM optimization fram…
▽ More
The CASH problem has been widely studied in the context of automated configurations of machine learning (ML) pipelines and various solvers and toolkits are available. However, CASH solvers do not directly handle black-box constraints such as fairness, robustness or other domain-specific custom constraints. We present our recent approach [Liu, et al., 2020] that leverages the ADMM optimization framework to decompose CASH into multiple small problems and demonstrate how ADMM facilitates incorporation of black-box constraints.
△ Less
Submitted 10 July, 2020; v1 submitted 16 June, 2020;
originally announced June 2020.
-
Improved Hodgkin & Huxley-type model for action potentials in squid
Authors:
P. J. Stiles,
C. G. Gray
Abstract:
By extending the crude Goldman-Hodgkin-Katz electrodiffusion model for resting-state membrane potentials in perfused giant axons of squid, we reformulate the Hodgkin-Huxley (HH) phenomenological quantitative model to create a new model which is simpler and based more fundamentally on electrodiffusion principles. Our dynamical system, like that of HH, behaves as a 4-dimensional resonator exhibiting…
▽ More
By extending the crude Goldman-Hodgkin-Katz electrodiffusion model for resting-state membrane potentials in perfused giant axons of squid, we reformulate the Hodgkin-Huxley (HH) phenomenological quantitative model to create a new model which is simpler and based more fundamentally on electrodiffusion principles. Our dynamical system, like that of HH, behaves as a 4-dimensional resonator exhibiting subthreshold oscillations. The predicted speed of propagating action potentials at 20 degrees Celsius is in good agreement with the HH experimental value at 18.5 degrees Celsius. After the external concentration of calcium ions is reduced, the generation of repetitive rebound action potentials is predicted by our model, in agreement with experiment, when the membrane is stimulated by a brief (0.1 ms) depolarizing current. Unlike the HH model, our model predicts, in agreement with experiment, that prolonged constant-current stimulation does not generate spike trains in perfused axons. Our resonator model predicts rebound spiking following prolonged hyperpolarizing stimulation, observed at 18.5 degrees Celsius by HH but not predicted at this temperature by their quantitative model. Spiking promoted by brief hyperpolarization is also predicted, at room temperature, by our electrodiffusion model, but only at much lower temperatures (ca. 6 degrees Celsius) by the HH model. We discuss qualitatively, more completely than do HH, temperature dependences of the various physical effects which determine resting and action potentials.
△ Less
Submitted 5 February, 2020; v1 submitted 14 August, 2019;
originally announced August 2019.
-
BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget
Authors:
Jack Turner,
Elliot J. Crowley,
Michael O'Boyle,
Amos Storkey,
Gavin Gray
Abstract:
The desire to map neural networks to varying-capacity devices has led to the development of a wealth of compression techniques, many of which involve replacing standard convolutional blocks in a large network with cheap alternative blocks. However, not all blocks are created equally; for a required compute budget there may exist a potent combination of many different cheap blocks, though exhaustiv…
▽ More
The desire to map neural networks to varying-capacity devices has led to the development of a wealth of compression techniques, many of which involve replacing standard convolutional blocks in a large network with cheap alternative blocks. However, not all blocks are created equally; for a required compute budget there may exist a potent combination of many different cheap blocks, though exhaustively searching for such a combination is prohibitively expensive. In this work, we develop BlockSwap: a fast algorithm for choosing networks with interleaved block types by passing a single minibatch of training data through randomly initialised networks and gauging their Fisher potential. These networks can then be used as students and distilled with the original large network as a teacher. We demonstrate the effectiveness of the chosen networks across CIFAR-10 and ImageNet for classification, and COCO for detection, and provide a comprehensive ablation study of our approach. BlockSwap quickly explores possible block configurations using a simple architecture ranking system, yielding highly competitive networks in orders of magnitude less time than most architecture search techniques (e.g. under 5 minutes on a single GPU for CIFAR-10). Code is available at https://github.com/BayesWatch/pytorch-blockswap.
△ Less
Submitted 23 January, 2020; v1 submitted 10 June, 2019;
originally announced June 2019.
-
Separable Layers Enable Structured Efficient Linear Substitutions
Authors:
Gavin Gray,
Elliot J. Crowley,
Amos Storkey
Abstract:
In response to the development of recent efficient dense layers, this paper shows that something as simple as replacing linear components in pointwise convolutions with structured linear decompositions also produces substantial gains in the efficiency/accuracy tradeoff. Pointwise convolutions are fully connected layers and are thus prepared for replacement by structured transforms. Networks using…
▽ More
In response to the development of recent efficient dense layers, this paper shows that something as simple as replacing linear components in pointwise convolutions with structured linear decompositions also produces substantial gains in the efficiency/accuracy tradeoff. Pointwise convolutions are fully connected layers and are thus prepared for replacement by structured transforms. Networks using such layers are able to learn the same tasks as those using standard convolutions, and provide Pareto-optimal benefits in efficiency/accuracy, both in terms of computation (mult-adds) and parameter count (and hence memory). Code is available at https://github.com/BayesWatch/deficient-efficient.
△ Less
Submitted 3 June, 2019;
originally announced June 2019.
-
Lost Silence: An emergency response early detection service through continuous processing of telecommunication data streams
Authors:
Qianru Zhou,
Stephen McLaughlin,
Alasdair J. G. Gray,
Shangbin Wu,
Chengxiang Wang
Abstract:
Early detection of significant traumatic events, e.g. a terrorist attack or a ship capsizing, is important to ensure that a prompt emergency response can occur. In the modern world telecommunication systems could play a key role in ensuring a successful emergency response by detecting such incidents through significant changes in calls and access to the networks. In this paper a methodology is ill…
▽ More
Early detection of significant traumatic events, e.g. a terrorist attack or a ship capsizing, is important to ensure that a prompt emergency response can occur. In the modern world telecommunication systems could play a key role in ensuring a successful emergency response by detecting such incidents through significant changes in calls and access to the networks. In this paper a methodology is illustrated to detect such incidents immediately (with the delay in the order of milliseconds), by processing semantically annotated streams of data in cellular telecommunication systems. In our methodology, live information about the position and status of phones are encoded as RDF streams. We propose an algorithm that processes streams of RDF annotated telecommunication data to detect abnormality. Our approach is exemplified in the context of a passenger cruise ship capsizing. However, the approach is readily translatable to other incidents. Our evaluation results show that with a properly chosen window size, such incidents can be detected efficiently and effectively.
△ Less
Submitted 13 March, 2019;
originally announced March 2019.
-
A geometric state function for two-fluid flow in porous media
Authors:
James E. McClure,
Ryan T. Armstrong,
Mark A. Berrill,
Steffen Schlüter,
Steffen Berg,
William G. Gray,
Cass T. Miller
Abstract:
Models that describe two-fluid flow in porous media suffer from a widely-recognized problem that the constitutive relationships used to predict capillary pressure as a function of the fluid saturation are non-unique, thus requiring a hysteretic description. As an alternative to the traditional perspec- tive, we consider a geometrical description of the capillary pressure, which relates the average…
▽ More
Models that describe two-fluid flow in porous media suffer from a widely-recognized problem that the constitutive relationships used to predict capillary pressure as a function of the fluid saturation are non-unique, thus requiring a hysteretic description. As an alternative to the traditional perspec- tive, we consider a geometrical description of the capillary pressure, which relates the average mean curvature, the fluid saturation, the interfacial area between fluids, and the Euler characteristic. The state equation is formulated using notions from algebraic topology and cast in terms of measures of the macroscale state. Synchrotron-based X-ray micro-computed tomography (μCT) and high- resolution pore-scale simulation is applied to examine the uniqueness of the proposed relationship for six different porous media. We show that the geometric state function is able to characterize the microscopic fluid configurations that result from a wide range of simulated flow conditions in an averaged sense. The geometric state function can serve as a closure relationship within macroscale models to effectively remove hysteretic behavior attributed to the arrangement of fluids within a porous medium. This provides a critical missing component needed to enable a new generation of higher fidelity models to describe two-fluid flow in porous media.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Nonlinear Electrostatics. The Poisson-Boltzmann Equation
Authors:
C. G. Gray,
P. J. Stiles
Abstract:
The description of a conducting medium in thermal equilibrium, such as an electrolyte solution or a plasma, involves nonlinear electrostatics, a subject rarely discussed in the standard electricity and magnetism textbooks. We consider in detail the case of the electrostatic double layer formed by an electrolyte solution near a uniformly charged wall, and we use mean-field or Poisson-Boltzmann (PB)…
▽ More
The description of a conducting medium in thermal equilibrium, such as an electrolyte solution or a plasma, involves nonlinear electrostatics, a subject rarely discussed in the standard electricity and magnetism textbooks. We consider in detail the case of the electrostatic double layer formed by an electrolyte solution near a uniformly charged wall, and we use mean-field or Poisson-Boltzmann (PB) theory to calculate the mean electrostatic potential and the mean ion concentrations, as functions of distance from the wall. PB theory is developed from the Gibbs variational principle for thermal equilibrium of minimizing the system free energy. We clarify the key issue of which free energy (Helmholtz, Gibbs, grand,...) should be used in the Gibbs principle; this turns out to depend not only on the specified conditions in the bulk electrolyte solution (e.g., fixed volume or fixed pressure), but also on the specified surface conditions, such as fixed surface charge or fixed surface potential. Despite its nonlinearity the PB equation for the mean electrostatic potential can be solved analytically for planar or wall geometry, and we present analytic solutions for both a full electrolyte, and for an ionic solution which contains only counterions, i.e. ions of sign opposite to that of the wall charge. This latter case has some novel features. We also use the free energy to discuss the inter-wall forces which arise when the two parallel charged walls are sufficiently close to permit their double layers to overlap. We consider situations where the two walls carry equal charges, and where they carry equal and opposite charges.
△ Less
Submitted 6 March, 2018;
originally announced March 2018.
-
Moonshine: Distilling with Cheap Convolutions
Authors:
Elliot J. Crowley,
Gavin Gray,
Amos Storkey
Abstract:
Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture…
▽ More
Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used. Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff. We show that substantial memory savings are possible with very little loss of accuracy, and confirm that distillation provides student network performance that is better than training that student architecture directly on data.
△ Less
Submitted 17 January, 2019; v1 submitted 7 November, 2017;
originally announced November 2017.
-
Stiff-spring approximation revisited: inertial effects in non-equilibrium trajectories
Authors:
Mostafa Nategholeslam,
C. G. Gray,
Bruno Tomberli
Abstract:
Use of harmonic guiding potentials is the most common method for implementing steered molecular dynamics (SMD) simulations, performed to obtain potentials of mean force (PMFs) of molecular systems using non-equilibrium work (NEW) theorems. Harmonic guiding potentials are also the natural choice in single molecule force spectroscopy experiments. The stiff spring approximation (SSA) of Schulten and…
▽ More
Use of harmonic guiding potentials is the most common method for implementing steered molecular dynamics (SMD) simulations, performed to obtain potentials of mean force (PMFs) of molecular systems using non-equilibrium work (NEW) theorems. Harmonic guiding potentials are also the natural choice in single molecule force spectroscopy experiments. The stiff spring approximation (SSA) of Schulten and coworkers enables to use the work performed along SMD trajectories to obtain the PMF.
We discuss and demonstrate how a high spring constant, k, required for the validity of the SSA can violate another requirement of this theory, i.e., the validity of Brownian dynamics of the system. Violation of the Brownian condition results in the introduction of kinetic energy contributions to the external work, performed during SMD simulations. These inertial effects result in skewed work distributions, rather than the Gaussian distributions predicted by SSA. The inertial effects also result in broader work distributions, which worsen the effect of the skewness when calculating work averages. Remarkably, our results strongly suggest that the skew and width of work distributions are independent of the average drift velocity and physical asymmetries.
The skew and broadening of work distributions result in biased estimation of the PMF. The bias manifests itself in the form of a systematic error that increases with simulation time. We discuss the proper upper limit for k, such that the inertial effects are avoided. This limit, used together with the relation for the lower limit of k, enables to conduct accurate steering while satisfying the Brownian dynamics. Furthermore, we argue and demonstrate that using the peak-value (rather than the statistical mean) of the work distributions vastly reduces the bias in the calculated PMFs and improves the accuracy.
△ Less
Submitted 25 July, 2016;
originally announced July 2016.
-
McMillan-Mayer Theory of Solutions Revisited: Simplifications and Extensions
Authors:
Shaghayegh Vafaei,
Bruno Tomberli,
C. G. Gray
Abstract:
McMillan and Mayer (MM) proved two remarkable theorems in their paper on the equilibrium statistical mechanics of liquid solutions. They first showed that the grand canonical partition function for a solution can be reduced to a one with an effectively solute-only form, by integrating out the solvent degrees of freedom. The total effective solute potential in the effective solute grand partition f…
▽ More
McMillan and Mayer (MM) proved two remarkable theorems in their paper on the equilibrium statistical mechanics of liquid solutions. They first showed that the grand canonical partition function for a solution can be reduced to a one with an effectively solute-only form, by integrating out the solvent degrees of freedom. The total effective solute potential in the effective solute grand partition function can be decomposed into components which are potentials of mean force for isolated groups of one, two, three, etc, solute molecules. Secondly, from the first result, now assuming low solute concentration, MM derived an expansion for the osmotic pressure in powers of the solute concentration, in complete analogy with the virial expansion of gas pressure in powers of the density at low density. The molecular expressions found for the osmotic virial coefficients have exactly the same form as the corresponding gas virial coefficients, with potentials of mean force replacing vacuum potentials. In this paper we restrict ourselves to binary liquid solutions with solute species $A$ and solvent species $B$ and do three things: (a) By working with a semi-grand canonical ensemble (grand with respect to solvent only) instead of the grand canonical ensemble used by MM, and avoiding graphical methods, we have greatly simplified the derivation of the first MM result,(b) by using a simple nongraphical method developed by van Kampen for gases, we have greatly simplified the derivation of the second MM result,i.e.,the osmotic pressure virial expansion; as a by-product, we show the precise relation between MM theory and Widom potential distribution theory, and (c) we have extended MM theory by deriving virial expansions for other solution properties such as the enthalpy of mixing. The latter expansion, with changed independent variables corresponding to current experiments, is proving useful.
△ Less
Submitted 18 June, 2014; v1 submitted 5 June, 2014;
originally announced June 2014.
-
Modeling an Augmented Lagrangian for Blackbox Constrained Optimization
Authors:
Robert B. Gramacy,
Genetha A. Gray,
Sebastien Le Digabel,
Herbert K. H. Lee,
Pritam Ranjan,
Garth Wells,
Stefan M. Wild
Abstract:
Constrained blackbox optimization is a difficult problem, with most approaches coming from the mathematical programming literature. The statistical literature is sparse, especially in addressing problems with nontrivial constraints. This situation is unfortunate because statistical methods have many attractive properties: global scope, handling noisy objectives, sensitivity analysis, and so forth.…
▽ More
Constrained blackbox optimization is a difficult problem, with most approaches coming from the mathematical programming literature. The statistical literature is sparse, especially in addressing problems with nontrivial constraints. This situation is unfortunate because statistical methods have many attractive properties: global scope, handling noisy objectives, sensitivity analysis, and so forth. To narrow that gap, we propose a combination of response surface modeling, expected improvement, and the augmented Lagrangian numerical optimization framework. This hybrid approach allows the statistical model to think globally and the augmented Lagrangian to act locally. We focus on problems where the constraints are the primary bottleneck, requiring expensive simulation to evaluate and substantial modeling effort to map out. In that context, our hybridization presents a simple yet effective solution that allows existing objective-oriented statistical approaches, like those based on Gaussian process surrogates and expected improvement heuristics, to be applied to the constrained setting with minor modification. This work is motivated by a challenging, real-data benchmark problem from hydrology where, even with a simple linear objective function, learning a nontrivial valid region complicates the search for a global minimum.
△ Less
Submitted 3 March, 2015; v1 submitted 19 March, 2014;
originally announced March 2014.
-
Building Bridges: Viewing Active Learning from the Multi-Armed Bandit Lens
Authors:
Ravi Ganti,
Alexander G. Gray
Abstract:
In this paper we propose a multi-armed bandit inspired, pool based active learning algorithm for the problem of binary classification. By carefully constructing an analogy between active learning and multi-armed bandits, we utilize ideas such as lower confidence bounds, and self-concordant regularization from the multi-armed bandit literature to design our proposed algorithm. Our algorithm is a se…
▽ More
In this paper we propose a multi-armed bandit inspired, pool based active learning algorithm for the problem of binary classification. By carefully constructing an analogy between active learning and multi-armed bandits, we utilize ideas such as lower confidence bounds, and self-concordant regularization from the multi-armed bandit literature to design our proposed algorithm. Our algorithm is a sequential algorithm, which in each round assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for the label of this sampled point. The design of this sampling distribution is also inspired by the analogy between active learning and multi-armed bandits. We show how to derive lower confidence bounds required by our algorithm. Experimental comparisons to previously proposed active learning algorithms show superior performance on some standard UCI datasets.
△ Less
Submitted 26 September, 2013;
originally announced September 2013.
-
PAV ontology: Provenance, Authoring and Versioning
Authors:
Paolo Ciccarese,
Stian Soiland-Reyes,
Khalid Belhajjame,
Alasdair J G Gray,
Carole Goble,
Tim Clark
Abstract:
Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as DC Terms and the W3C PROV-O are domain-independent and general-purpose and they allow and encourage for extensions to…
▽ More
Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as DC Terms and the W3C PROV-O are domain-independent and general-purpose and they allow and encourage for extensions to cover more specific needs. We identify the specific need for identifying or distinguishing between the various roles assumed by agents manipulating digital artifacts, such as author, contributor and curator.
We present the Provenance, Authoring and Versioning ontology (PAV): a lightweight ontology for capturing just enough descriptions essential for tracking the provenance, authoring and versioning of web resources. We argue that such descriptions are essential for digital scientific content. PAV distinguishes between contributors, authors and curators of content and creators of representations in addition to the provenance of originating resources that have been accessed, transformed and consumed. We explore five projects (and communities) that have adopted PAV illustrating their usage through concrete examples. Moreover, we present map**s that show how PAV extends the PROV-O ontology to support broader interoperability.
The authors strived to keep PAV lightweight and compact by including only those terms that have demonstrated to be pragmatically useful in existing applications, and by recommending terms from existing ontologies when plausible.
We analyze and compare PAV with related approaches, namely Provenance Vocabulary, DC Terms and BIBFRAME. We identify similarities and analyze their differences with PAV, outlining strengths and weaknesses of our proposed model. We specify SKOS map**s that align PAV with DC Terms.
△ Less
Submitted 6 December, 2013; v1 submitted 26 April, 2013;
originally announced April 2013.
-
Tree-Independent Dual-Tree Algorithms
Authors:
Ryan R. Curtin,
William B. March,
Parikshit Ram,
David V. Anderson,
Alexander G. Gray,
Charles L. Isbell Jr
Abstract:
Dual-tree algorithms are a widely used class of branch-and-bound algorithms. Unfortunately, develo** dual-tree algorithms for use with different trees and problems is often complex and burdensome. We introduce a four-part logical split: the tree, the traversal, the point-to-point base case, and the pruning rule. We provide a meta-algorithm which allows development of dual-tree algorithms in a tr…
▽ More
Dual-tree algorithms are a widely used class of branch-and-bound algorithms. Unfortunately, develo** dual-tree algorithms for use with different trees and problems is often complex and burdensome. We introduce a four-part logical split: the tree, the traversal, the point-to-point base case, and the pruning rule. We provide a meta-algorithm which allows development of dual-tree algorithms in a tree-independent manner and easy extension to entirely new types of trees. Representations are provided for five common algorithms; for k-nearest neighbor search, this leads to a novel, tighter pruning bound. The meta-algorithm also allows straightforward extensions to massively parallel settings.
△ Less
Submitted 16 April, 2013;
originally announced April 2013.
-
The effect of shock-wave profile on dynamic brittle failure
Authors:
J. Pablo Escobedo,
Eric N. Brown,
Carl P. Trujillo,
Ellen K. Cerreta,
George T. Gray III
Abstract:
The influence of shock-wave-loading profile on the failure processes in a brittle material has been investigated. Tungsten heavy alloy (WHA) specimens have been subjected to two shock-wave loading profiles with a similar peak stress of 15.4 GPa but different pulse durations. Contrary to the strong dependence of strength on wave profile observed in ductile metals, for WHA, specimens subjected to di…
▽ More
The influence of shock-wave-loading profile on the failure processes in a brittle material has been investigated. Tungsten heavy alloy (WHA) specimens have been subjected to two shock-wave loading profiles with a similar peak stress of 15.4 GPa but different pulse durations. Contrary to the strong dependence of strength on wave profile observed in ductile metals, for WHA, specimens subjected to different loading profiles exhibited similar spall strength and damage evolution morphology. Post-mortem examination of recovered samples revealed that dynamic failure for both loading profiles is dominated by brittle cleavage fracture, with additional energy dissipation through crack branching in the more brittle tungsten particles. Overall, in this brittle material all relevant damage kinetics and the spall strength are shown to be dominated by the shock peak stress, independent of pulse duration.
△ Less
Submitted 2 March, 2013;
originally announced March 2013.
-
MLPACK: A Scalable C++ Machine Learning Library
Authors:
Ryan R. Curtin,
James R. Cline,
N. P. Slagle,
William B. March,
Parikshit Ram,
Nishant A. Mehta,
Alexander G. Gray
Abstract:
MLPACK is a state-of-the-art, scalable, multi-platform C++ machine learning library released in late 2011 offering both a simple, consistent API accessible to novice users and high performance and flexibility to expert users by leveraging modern features of C++. MLPACK provides cutting-edge algorithms whose benchmarks exhibit far better performance than other leading machine learning libraries. ML…
▽ More
MLPACK is a state-of-the-art, scalable, multi-platform C++ machine learning library released in late 2011 offering both a simple, consistent API accessible to novice users and high performance and flexibility to expert users by leveraging modern features of C++. MLPACK provides cutting-edge algorithms whose benchmarks exhibit far better performance than other leading machine learning libraries. MLPACK version 1.0.3, licensed under the LGPL, is available at http://www.mlpack.org.
△ Less
Submitted 23 October, 2012;
originally announced October 2012.
-
Fast Exact Max-Kernel Search
Authors:
Ryan R. Curtin,
Parikshit Ram,
Alexander G. Gray
Abstract:
The wide applicability of kernels makes the problem of max-kernel search ubiquitous and more general than the usual similarity search in metric spaces. We focus on solving this problem efficiently. We begin by characterizing the inherent hardness of the max-kernel search problem with a novel notion of directional concentration. Following that, we present a method to use an $O(n \log n)$ algorithm…
▽ More
The wide applicability of kernels makes the problem of max-kernel search ubiquitous and more general than the usual similarity search in metric spaces. We focus on solving this problem efficiently. We begin by characterizing the inherent hardness of the max-kernel search problem with a novel notion of directional concentration. Following that, we present a method to use an $O(n \log n)$ algorithm to index any set of objects (points in $\Real^\dims$ or abstract objects) directly in the Hilbert space without any explicit feature representations of the objects in this space. We present the first provably $O(\log n)$ algorithm for exact max-kernel search using this index. Empirical results for a variety of data sets as well as abstract objects demonstrate up to 4 orders of magnitude speedup in some cases. Extensions for approximate max-kernel search are also presented.
△ Less
Submitted 26 October, 2012; v1 submitted 23 October, 2012;
originally announced October 2012.
-
Minimax Multi-Task Learning and a Generalized Loss-Compositional Paradigm for MTL
Authors:
Nishant A. Mehta,
Dongryeol Lee,
Alexander G. Gray
Abstract:
Since its inception, the modus operandi of multi-task learning (MTL) has been to minimize the task-wise mean of the empirical risks. We introduce a generalized loss-compositional paradigm for MTL that includes a spectrum of formulations as a subfamily. One endpoint of this spectrum is minimax MTL: a new MTL formulation that minimizes the maximum of the tasks' empirical risks. Via a certain relaxat…
▽ More
Since its inception, the modus operandi of multi-task learning (MTL) has been to minimize the task-wise mean of the empirical risks. We introduce a generalized loss-compositional paradigm for MTL that includes a spectrum of formulations as a subfamily. One endpoint of this spectrum is minimax MTL: a new MTL formulation that minimizes the maximum of the tasks' empirical risks. Via a certain relaxation of minimax MTL, we obtain a continuum of MTL formulations spanning minimax MTL and classical MTL. The full paradigm itself is loss-compositional, operating on the vector of empirical risks. It incorporates minimax MTL, its relaxations, and many new MTL formulations as special cases. We show theoretically that minimax MTL tends to avoid worst case outcomes on newly drawn test tasks in the learning to learn (LTL) test setting. The results of several MTL formulations on synthetic and real problems in the MTL and LTL test settings are encouraging.
△ Less
Submitted 13 September, 2012;
originally announced September 2012.
-
Faster Gaussian Summation: Theory and Experiment
Authors:
Dongryeol Lee,
Alexander G. Gray
Abstract:
We provide faster algorithms for the problem of Gaussian summation, which occurs in many machine learning methods. We develop two new extensions - an O(Dp) Taylor expansion for the Gaussian kernel with rigorous error bounds and a new error control scheme integrating any arbitrary approximation method - within the best discretealgorithmic framework using adaptive hierarchical data structures. We ri…
▽ More
We provide faster algorithms for the problem of Gaussian summation, which occurs in many machine learning methods. We develop two new extensions - an O(Dp) Taylor expansion for the Gaussian kernel with rigorous error bounds and a new error control scheme integrating any arbitrary approximation method - within the best discretealgorithmic framework using adaptive hierarchical data structures. We rigorously evaluate these techniques empirically in the context of optimal bandwidth selection in kernel density estimation, revealing the strengths and weaknesses of current state-of-the-art approaches for the first time. Our results demonstrate that the new error control scheme yields improved performance, whereas the series expansion approach is only effective in low dimensions (five or less).
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
Fast Nonparametric Conditional Density Estimation
Authors:
Michael P. Holmes,
Alexander G. Gray,
Charles Lee Isbell
Abstract:
Conditional density estimation generalizes regression by modeling a full density f(yjx) rather than only the expected value E(yjx). This is important for many tasks, including handling multi-modality and generating prediction intervals. Though fundamental and widely applicable, nonparametric conditional density estimators have received relatively little attention from statisticians and little or n…
▽ More
Conditional density estimation generalizes regression by modeling a full density f(yjx) rather than only the expected value E(yjx). This is important for many tasks, including handling multi-modality and generating prediction intervals. Though fundamental and widely applicable, nonparametric conditional density estimators have received relatively little attention from statisticians and little or none from the machine learning community. None of that work has been applied to greater than bivariate data, presumably due to the computational difficulty of data-driven bandwidth selection. We describe the double kernel conditional density estimator and derive fast dual-tree-based algorithms for bandwidth selection using a maximum likelihood criterion. These techniques give speedups of up to 3.8 million in our experiments, and enable the first applications to previously intractable large multivariate datasets, including a redshift prediction problem from the Sloan Digital Sky Survey.
△ Less
Submitted 20 June, 2012;
originally announced June 2012.
-
Maximum Inner-Product Search using Tree Data-structures
Authors:
Parikshit Ram,
Alexander G. Gray
Abstract:
The problem of {\em efficiently} finding the best match for a query in a given set with respect to the Euclidean distance or the cosine similarity has been extensively studied in literature. However, a closely related problem of efficiently finding the best match with respect to the inner product has never been explored in the general setting to the best of our knowledge. In this paper we consider…
▽ More
The problem of {\em efficiently} finding the best match for a query in a given set with respect to the Euclidean distance or the cosine similarity has been extensively studied in literature. However, a closely related problem of efficiently finding the best match with respect to the inner product has never been explored in the general setting to the best of our knowledge. In this paper we consider this general problem and contrast it with the existing best-match algorithms. First, we propose a general branch-and-bound algorithm using a tree data structure. Subsequently, we present a dual-tree algorithm for the case where there are multiple queries. Finally we present a new data structure for increasing the efficiency of the dual-tree algorithm. These branch-and-bound algorithms involve novel bounds suited for the purpose of best-matching with inner products. We evaluate our proposed algorithms on a variety of data sets from various applications, and exhibit up to five orders of magnitude improvement in query time over the naive search technique.
△ Less
Submitted 27 February, 2012;
originally announced February 2012.
-
On the Sample Complexity of Predictive Sparse Coding
Authors:
Nishant A. Mehta,
Alexander G. Gray
Abstract:
The goal of predictive sparse coding is to learn a representation of examples as sparse linear combinations of elements from a dictionary, such that a learned hypothesis linear in the new representation performs well on a predictive task. Predictive sparse coding algorithms recently have demonstrated impressive performance on a variety of supervised tasks, but their generalization properties have…
▽ More
The goal of predictive sparse coding is to learn a representation of examples as sparse linear combinations of elements from a dictionary, such that a learned hypothesis linear in the new representation performs well on a predictive task. Predictive sparse coding algorithms recently have demonstrated impressive performance on a variety of supervised tasks, but their generalization properties have not been studied. We establish the first generalization error bounds for predictive sparse coding, covering two settings: 1) the overcomplete setting, where the number of features k exceeds the original dimensionality d; and 2) the high or infinite-dimensional setting, where only dimension-free bounds are useful. Both learning bounds intimately depend on stability properties of the learned sparse encoder, as measured on the training sample. Consequently, we first present a fundamental stability result for the LASSO, a result characterizing the stability of the sparse codes with respect to perturbations to the dictionary. In the overcomplete setting, we present an estimation error bound that decays as \tilde{O}(sqrt(d k/m)) with respect to d and k. In the high or infinite-dimensional setting, we show a dimension-free bound that is \tilde{O}(sqrt(k^2 s / m)) with respect to k and s, where s is an upper bound on the number of non-zeros in the sparse code for any training data point.
△ Less
Submitted 7 October, 2012; v1 submitted 17 February, 2012;
originally announced February 2012.
-
IVOA Recommendation: Vocabularies in the Virtual Observatory Version 1.19
Authors:
Sebastien Derriere,
Alasdair J G Gray,
Norman Gray,
Frederic V Hessman,
Tony Linde,
Andrea Preite Martinez,
Rob Seaman,
Brian Thomas
Abstract:
This document specifies a standard format for vocabularies based on the W3C's Resource Description Framework (RDF) and Simple Knowledge Organization System (SKOS). By adopting a standard and simple format, the IVOA will permit different groups to create and maintain their own specialised vocabularies while letting the rest of the astronomical community access, use, and combine them. The use of cur…
▽ More
This document specifies a standard format for vocabularies based on the W3C's Resource Description Framework (RDF) and Simple Knowledge Organization System (SKOS). By adopting a standard and simple format, the IVOA will permit different groups to create and maintain their own specialised vocabularies while letting the rest of the astronomical community access, use, and combine them. The use of current, open standards ensures that VO applications will be able to tap into resources of the growing semantic web. The document provides several examples of useful astronomical vocabularies.
△ Less
Submitted 3 October, 2011;
originally announced October 2011.
-
Multibody Multipole Methods
Authors:
Dongryeol Lee,
Arkadas Ozakin,
Alexander G. Gray
Abstract:
A three-body potential function can account for interactions among triples of particles which are uncaptured by pairwise interaction functions such as Coulombic or Lennard-Jones potentials. Likewise, a multibody potential of order $n$ can account for interactions among $n$-tuples of particles uncaptured by interaction functions of lower orders. To date, the computation of multibody potential funct…
▽ More
A three-body potential function can account for interactions among triples of particles which are uncaptured by pairwise interaction functions such as Coulombic or Lennard-Jones potentials. Likewise, a multibody potential of order $n$ can account for interactions among $n$-tuples of particles uncaptured by interaction functions of lower orders. To date, the computation of multibody potential functions for a large number of particles has not been possible due to its $O(N^n)$ scaling cost. In this paper we describe a fast tree-code for efficiently approximating multibody potentials that can be factorized as products of functions of pairwise distances. For the first time, we show how to derive a Barnes-Hut type algorithm for handling interactions among more than two particles. Our algorithm uses two approximation schemes: 1) a deterministic series expansion-based method; 2) a Monte Carlo-based approximation based on the central limit theorem. Our approach guarantees a user-specified bound on the absolute or relative error in the computed potential with an asymptotic probability guarantee. We provide speedup results on a three-body dispersion potential, the Axilrod-Teller potential.
△ Less
Submitted 30 June, 2012; v1 submitted 13 May, 2011;
originally announced May 2011.
-
Dual-Tree Fast Gauss Transforms
Authors:
Dongryeol Lee,
Alexander G. Gray,
Andrew W. Moore
Abstract:
Kernel density estimation (KDE) is a popular statistical technique for estimating the underlying density distribution with minimal assumptions. Although they can be shown to achieve asymptotic estimation optimality for any input distribution, cross-validating for an optimal parameter requires significant computation dominated by kernel summations. In this paper we present an improvement to the dua…
▽ More
Kernel density estimation (KDE) is a popular statistical technique for estimating the underlying density distribution with minimal assumptions. Although they can be shown to achieve asymptotic estimation optimality for any input distribution, cross-validating for an optimal parameter requires significant computation dominated by kernel summations. In this paper we present an improvement to the dual-tree algorithm, the first practical kernel summation algorithm for general dimension. Our extension is based on the series-expansion for the Gaussian kernel used by fast Gauss transform. First, we derive two additional analytical machinery for extending the original algorithm to utilize a hierarchical data structure, demonstrating the first truly hierarchical fast Gauss transform. Second, we show how to integrate the series-expansion approximation within the dual-tree approach to compute kernel summations with a user-controllable relative error bound. We evaluate our algorithm on real-world datasets in the context of optimal bandwidth selection in kernel density estimation. Our results demonstrate that our new algorithm is the only one that guarantees a hard relative error bound and offers fast performance across a wide range of bandwidths evaluated in cross validation procedures.
△ Less
Submitted 14 February, 2011;
originally announced February 2011.
-
The magnetic fields of forming solar-like stars
Authors:
S. G. Gregory,
M. Jardine,
C. G. Gray,
J. -F. Donati
Abstract:
Magnetic fields play a crucial role at all stages of the formation of low mass stars and planetary systems. In the final stages, in particular, they control the kinematics of in-falling gas from circumstellar discs, and the launching and collimation of spectacular outflows. The magnetic coupling with the disc is thought to influence the rotational evolution of the star, while magnetised stellar wi…
▽ More
Magnetic fields play a crucial role at all stages of the formation of low mass stars and planetary systems. In the final stages, in particular, they control the kinematics of in-falling gas from circumstellar discs, and the launching and collimation of spectacular outflows. The magnetic coupling with the disc is thought to influence the rotational evolution of the star, while magnetised stellar winds control the braking of more evolved stars and may influence the migration of planets. Magnetic reconnection events trigger energetic flares which irradiate circumstellar discs with high energy particles that influence the disc chemistry and set the initial conditions for planet formation. However, it is only in the past few years that the current generation of optical spectropolarimeters have allowed the magnetic fields of forming solar-like stars to be probed in unprecedented detail. In order to do justice to the recent extensive observational programs new theoretical models are being developed that incorporate magnetic fields with an observed degree of complexity. In this review we draw together disparate results from the classical electromagnetism, molecular physics/chemistry, and the geophysics literature, and demonstrate how they can be adapted to construct models of the large scale magnetospheres of stars and planets. We conclude by examining how the incorporation of multipolar magnetic fields into new theoretical models will drive future progress in the field through the elucidation of several observational conundrums.
△ Less
Submitted 11 August, 2010;
originally announced August 2010.
-
Generative and Latent Mean Map Kernels
Authors:
Nishant A. Mehta,
Alexander G. Gray
Abstract:
We introduce two kernels that extend the mean map, which embeds probability measures in Hilbert spaces. The generative mean map kernel (GMMK) is a smooth similarity measure between probabilistic models. The latent mean map kernel (LMMK) generalizes the non-iid formulation of Hilbert space embeddings of empirical distributions in order to incorporate latent variable models. When comparing certain c…
▽ More
We introduce two kernels that extend the mean map, which embeds probability measures in Hilbert spaces. The generative mean map kernel (GMMK) is a smooth similarity measure between probabilistic models. The latent mean map kernel (LMMK) generalizes the non-iid formulation of Hilbert space embeddings of empirical distributions in order to incorporate latent variable models. When comparing certain classes of distributions, the GMMK exhibits beneficial regularization and generalization properties not shown for previous generative kernels. We present experiments comparing support vector machine performance using the GMMK and LMMK between hidden Markov models to the performance of other methods on discrete and continuous observation sequence data. The results suggest that, in many cases, the GMMK has generalization error competitive with or better than other methods.
△ Less
Submitted 3 May, 2010;
originally announced May 2010.
-
Sequential category aggregation and partitioning approaches for multi-way contingency tables based on survey and census data
Authors:
L. Fraser Jackson,
Alistair G. Gray,
Stephen E. Fienberg
Abstract:
Large contingency tables arise in many contexts but especially in the collection of survey and census data by government statistical agencies. Because the vast majority of the variables in this context have a large number of categories, agencies and users need a systematic way of constructing tables which are summaries of such contingency tables. We propose such an approach in this paper by find…
▽ More
Large contingency tables arise in many contexts but especially in the collection of survey and census data by government statistical agencies. Because the vast majority of the variables in this context have a large number of categories, agencies and users need a systematic way of constructing tables which are summaries of such contingency tables. We propose such an approach in this paper by finding members of a class of restricted log-linear models which maximize the likelihood of the data and use this to find a parsimonious means of representing the table. In contrast with more standard approaches for model search in hierarchical log-linear models (HLLM), our procedure systematically reduces the number of categories of the variables. Through a series of examples, we illustrate the extent to which it can preserve the interaction structure found with HLLMs and be used as a data simplification procedure prior to HLL modeling. A feature of the procedure is that it can easily be applied to many tables with millions of cells, providing a new way of summarizing large data sets in many disciplines. The focus is on information and description rather than statistical testing. The procedure may treat each variable in the table in different ways, preserving full detail, treating it as fully nominal, or preserving ordinality.
△ Less
Submitted 11 November, 2008;
originally announced November 2008.
-
Learning Isometric Separation Maps
Authors:
Nikolaos Vasiloglou,
Alexander G. Gray,
David V. Anderson
Abstract:
Maximum Variance Unfolding (MVU) and its variants have been very successful in embedding data-manifolds in lower dimensional spaces, often revealing the true intrinsic dimension. In this paper we show how to also incorporate supervised class information into an MVU-like method without breaking its convexity. We call this method the Isometric Separation Map and we show that the resulting kernel m…
▽ More
Maximum Variance Unfolding (MVU) and its variants have been very successful in embedding data-manifolds in lower dimensional spaces, often revealing the true intrinsic dimension. In this paper we show how to also incorporate supervised class information into an MVU-like method without breaking its convexity. We call this method the Isometric Separation Map and we show that the resulting kernel matrix can be used as a binary/multiclass Support Vector Machine-like method in a semi-supervised (transductive) framework. We also show that the method always finds a kernel matrix that linearly separates the training data exactly without projecting them in infinite dimensional spaces. In traditional SVMs we choose a kernel and hope that the data become linearly separable in the kernel space. In this paper we show how the hyperplane can be chosen ad-hoc and the kernel is trained so that data are always linearly separable. Comparisons with Large Margin SVMs show comparable performance.
△ Less
Submitted 15 April, 2009; v1 submitted 25 October, 2008;
originally announced October 2008.
-
Eight-Dimensional Mid-Infrared/Optical Bayesian Quasar Selection
Authors:
Gordon T. Richards,
Rajesh P. Deo,
Mark Lacy,
Adam D. Myers,
Robert C. Nichol,
Nadia L. Zakamska,
Robert J. Brunner,
W. N. Brandt,
Alexander G. Gray,
John K. Parejko,
Andrew Ptak,
Donald P. Schneider,
Lisa J. Storrie-Lombardi,
Alexander S. Szalay
Abstract:
We explore the multidimensional, multiwavelength selection of quasars from mid-IR (MIR) plus optical data, specifically from Spitzer-IRAC and the Sloan Digital Sky Survey (SDSS). We apply modern statistical techniques to combined Spitzer MIR and SDSS optical data, allowing up to 8-D color selection of quasars. Using a Bayesian selection method, we catalog 5546 quasar candidates to an 8.0 um dept…
▽ More
We explore the multidimensional, multiwavelength selection of quasars from mid-IR (MIR) plus optical data, specifically from Spitzer-IRAC and the Sloan Digital Sky Survey (SDSS). We apply modern statistical techniques to combined Spitzer MIR and SDSS optical data, allowing up to 8-D color selection of quasars. Using a Bayesian selection method, we catalog 5546 quasar candidates to an 8.0 um depth of 56 uJy over an area of ~24 sq. deg; ~70% of these candidates are not identified by applying the same Bayesian algorithm to 4-color SDSS optical data alone. Our selection recovers 97.7% of known type 1 quasars in this area and greatly improves the effectiveness of identifying 3.5<z<5 quasars. Even using only the two shortest wavelength IRAC bandpasses, it is possible to use our Bayesian techniques to select quasars with 97% completeness and as little as 10% contamination. This sample has a photometric redshift accuracy of 93.6% (Delta Z +/-0.3), remaining roughly constant when the two reddest MIR bands are excluded. While our methods are designed to find type 1 (unobscured) quasars, as many as 1200 of the objects are type 2 (obscured) quasar candidates. Coupling deep optical imaging data with deep mid-IR data could enable selection of quasars in significant numbers past the peak of the quasar luminosity function (QLF) to at least z~4. Such a sample would constrain the shape of the QLF and enable quasar clustering studies over the largest range of redshift and luminosity to date, yielding significant gains in our understanding of quasars and the evolution of galaxies.
△ Less
Submitted 25 February, 2009; v1 submitted 20 October, 2008;
originally announced October 2008.
-
Non-Negative Matrix Factorization, Convexity and Isometry
Authors:
Nikolaos Vasiloglou,
Alexander G. Gray,
David V. Anderson
Abstract:
In this paper we explore avenues for improving the reliability of dimensionality reduction methods such as Non-Negative Matrix Factorization (NMF) as interpretive exploratory data analysis tools. We first explore the difficulties of the optimization problem underlying NMF, showing for the first time that non-trivial NMF solutions always exist and that the optimization problem is actually convex,…
▽ More
In this paper we explore avenues for improving the reliability of dimensionality reduction methods such as Non-Negative Matrix Factorization (NMF) as interpretive exploratory data analysis tools. We first explore the difficulties of the optimization problem underlying NMF, showing for the first time that non-trivial NMF solutions always exist and that the optimization problem is actually convex, by using the theory of Completely Positive Factorization. We subsequently explore four novel approaches to finding globally-optimal NMF solutions using various ideas from convex optimization. We then develop a new method, isometric NMF (isoNMF), which preserves non-negativity while also providing an isometric embedding, simultaneously achieving two properties which are helpful for interpretation. Though it results in a more difficult optimization problem, we show experimentally that the resulting method is scalable and even achieves more compact spectra than standard NMF.
△ Less
Submitted 22 April, 2009; v1 submitted 13 October, 2008;
originally announced October 2008.
-
Efficient Photometric Selection of Quasars from the Sloan Digital Sky Survey: II. ~1,000,000 Quasars from Data Release Six
Authors:
Gordon T. Richards,
Adam D. Myers,
Alexander G. Gray,
Ryan N. Riegel,
Robert C. Nichol,
Robert J. Brunner,
Alexander S. Szalay,
Donald P. Schneider,
Scott F. Anderson
Abstract:
We present a catalog of 1,172,157 quasar candidates selected from the photometric imaging data of the Sloan Digital Sky Survey (SDSS). The objects are all point sources to a limiting magnitude of i=21.3 from 8417 sq. deg. of imaging from SDSS Data Release 6 (DR6). This sample extends our previous catalog by using the latest SDSS public release data and probing both UV-excess and high-redshift qu…
▽ More
We present a catalog of 1,172,157 quasar candidates selected from the photometric imaging data of the Sloan Digital Sky Survey (SDSS). The objects are all point sources to a limiting magnitude of i=21.3 from 8417 sq. deg. of imaging from SDSS Data Release 6 (DR6). This sample extends our previous catalog by using the latest SDSS public release data and probing both UV-excess and high-redshift quasars. While the addition of high-redshift candidates reduces the overall efficiency (quasars:quasar candidates) of the catalog to ~80%, it is expected to contain no fewer than 850,000 bona fide quasars -- ~8 times the number of our previous sample, and ~10 times the size of the largest spectroscopic quasar catalog. Cross-matching between our photometric catalog and spectroscopic quasar catalogs from both the SDSS and 2dF Surveys, yields 88,879 spectroscopically confirmed quasars. For judicious selection of the most robust UV-excess sources (~500,000 objects in all), the efficiency is nearly 97% -- more than sufficient for detailed statistical analyses. The catalog's completeness to type 1 (broad-line) quasars is expected to be no worse than 70%, with most missing objects occurring at z<0.7 and 2.5<z<3.0. In addition to classification information, we provide photometric redshift estimates (typically good to Delta z +/- 0.3 [2 sigma]) and cross-matching with radio, X-ray, and proper motion catalogs. Finally, we consider the catalog's utility for determining the optical luminosity function of quasars and are able to confirm the flattening of the bright-end slope of the quasar luminosity function at z~4 as compared to z~2.
△ Less
Submitted 23 September, 2008;
originally announced September 2008.
-
Detailed Examination of Transport Coefficients in Cubic-Plus-Quartic Oscillator Chains
Authors:
G. R. Lee-Dadswell,
B. G. Nickel,
C. G. Gray
Abstract:
We examine the thermal conductivity and bulk viscosity of a one-dimensional (1D) chain of particles with cubic-plus-quartic interparticle potentials and no on-site potentials. This system is equivalent to the FPU-alpha beta system in a subset of its parameter space. We identify three distinct frequency regimes which we call the hydrodynamic regime, the perturbative regime and the collisionless r…
▽ More
We examine the thermal conductivity and bulk viscosity of a one-dimensional (1D) chain of particles with cubic-plus-quartic interparticle potentials and no on-site potentials. This system is equivalent to the FPU-alpha beta system in a subset of its parameter space. We identify three distinct frequency regimes which we call the hydrodynamic regime, the perturbative regime and the collisionless regime. In the lowest frequency regime (the hydrodynamic regime) heat is transported ballistically by long wavelength sound modes. The model that we use to describe this behaviour predicts that as the frequency goes to zero the frequency dependent bulk viscosity and the frequency dependent thermal conductivity should diverge with the same power law dependence on frequency. Thus, we can define the bulk Prandtl number as the ratio of the bulk viscosity to the thermal conductivity (with suitable prefactors to render it dimensionless). This dimensionless ratio should approach a constant value as frequency goes to zero. We use mode-coupling theory to predict the zero frequency limit. Values of the bulk Prandtl number from simulations are in agreement with these predictions over a wide range of system parameters. In the middle frequency regime, which we call the perturbative regime, heat is transported by sound modes which are damped by four-phonon processes. We call the highest frequency regime the collisionless regime since at these frequencies the observing times are much shorter than the characteristic relaxation times of phonons. The perturbative and collisionless regimes are discussed in detail in the appendices.
△ Less
Submitted 4 October, 2007;
originally announced October 2007.
-
A high redshift detection of the integrated Sachs-Wolfe effect
Authors:
Tommaso Giannantonio,
Robert G. Crittenden,
Robert C. Nichol,
Ryan Scranton,
Gordon T. Richards,
Adam D. Myers,
Robert J. Brunner,
Alexander G. Gray,
Andrew J. Connolly,
Donald P. Schneider
Abstract:
We present evidence of a large angle correlation between the cosmic microwave background measured by WMAP and a catalog of photometrically detected quasars from the SDSS. The observed cross correlation is (0.30 +- 0.14) microK at zero lag, with a shape consistent with that expected for correlations arising from the integrated Sachs-Wolfe effect. The photometric redshifts of the quasars are cente…
▽ More
We present evidence of a large angle correlation between the cosmic microwave background measured by WMAP and a catalog of photometrically detected quasars from the SDSS. The observed cross correlation is (0.30 +- 0.14) microK at zero lag, with a shape consistent with that expected for correlations arising from the integrated Sachs-Wolfe effect. The photometric redshifts of the quasars are centered at z ~ 1.5, making this the deepest survey in which such a correlation has been observed. Assuming this correlation is due to the ISW effect, this constitutes the earliest evidence yet for dark energy and it can be used to constrain exotic dark energy models.
△ Less
Submitted 25 September, 2006; v1 submitted 26 July, 2006;
originally announced July 2006.
-
First Measurement of the Clustering Evolution of Photometrically-Classified Quasars
Authors:
Adam D. Myers,
Robert J. Brunner,
Gordon T. Richards,
Robert C. Nichol,
Donald P. Schneider,
Daniel E. Vanden Berk,
Ryan Scranton,
Alexander G. Gray,
Jon Brinkmann
Abstract:
We present new measurements of the quasar autocorrelation from a sample of \~80,000 photometrically-classified quasars taken from SDSS DR1. We find a best-fit model of $ω(θ) = (0.066\pm^{0.026}_{0.024})θ^{-(0.98\pm0.15)}$ for the angular autocorrelation, consistent with estimates from spectroscopic quasar surveys. We show that only models with little or no evolution in the clustering of quasars…
▽ More
We present new measurements of the quasar autocorrelation from a sample of \~80,000 photometrically-classified quasars taken from SDSS DR1. We find a best-fit model of $ω(θ) = (0.066\pm^{0.026}_{0.024})θ^{-(0.98\pm0.15)}$ for the angular autocorrelation, consistent with estimates from spectroscopic quasar surveys. We show that only models with little or no evolution in the clustering of quasars in comoving coordinates since z~1.4 can recover a scale-length consistent with local galaxies and Active Galactic Nuclei (AGNs). A model with little evolution of quasar clustering in comoving coordinates is best explained in the current cosmological paradigm by rapid evolution in quasar bias. We show that quasar biasing must have changed from b_Q~3 at a (photometric) redshift of z=2.2 to b_Q~1.2-1.3 by z=0.75. Such a rapid increase with redshift in biasing implies that quasars at z~2 cannot be the progenitors of modern L* objects, rather they must now reside in dense environments, such as clusters. Similarly, the duration of the UVX quasar phase must be short enough to explain why local UVX quasars reside in essentially unbiased structures. Our estimates of b_Q are in good agreement with recent spectroscopic results, which demonstrate the implied evolution in b_Q is consistent with quasars inhabiting halos of similar mass at every redshift. Treating quasar clustering as a function of both redshift and luminosity, we find no evidence for luminosity dependence in quasar clustering, and that redshift evolution thus affects quasar clustering more than changes in quasars' luminosity. We provide a new method for quantifying stellar contamination in photometrically-classified quasar catalogs via the correlation function.
△ Less
Submitted 24 October, 2005; v1 submitted 12 October, 2005;
originally announced October 2005.
-
Efficient Photometric Selection of Quasars from the Sloan Digital Sky Survey: 100,000 z<3 Quasars from Data Release One
Authors:
Gordon T. Richards,
Robert C. Nichol,
Alexander G. Gray,
Robert J. Brunner,
Robert H. Lupton,
Daniel E. Vanden Berk,
Shang Shan Chong,
Michael A. Weinstein,
Donald P. Schneider,
Scott F. Anderson,
Jeffrey A. Munn,
Hugh C. Harris,
Michael A. Strauss,
Xiaohui Fan,
James E. Gunn,
Zeljko Ivezic,
Donald G. York,
J. Brinkmann
Abstract:
We present a catalog of 100,563 unresolved, UV-excess (UVX) quasar candidates to g=21 from 2099 deg^2 of the Sloan Digital Sky Survey (SDSS) Data Release One (DR1) imaging data. Existing spectra of 22,737 sources reveals that 22,191 (97.6%) are quasars; accounting for the magnitude dependence of this efficiency, we estimate that 95,502 (95.0%) of the objects in the catalog are quasars. Such a hi…
▽ More
We present a catalog of 100,563 unresolved, UV-excess (UVX) quasar candidates to g=21 from 2099 deg^2 of the Sloan Digital Sky Survey (SDSS) Data Release One (DR1) imaging data. Existing spectra of 22,737 sources reveals that 22,191 (97.6%) are quasars; accounting for the magnitude dependence of this efficiency, we estimate that 95,502 (95.0%) of the objects in the catalog are quasars. Such a high efficiency is unprecedented in broad-band surveys of quasars. This ``proof-of-concept'' sample is designed to be maximally efficient, but still has 94.7% completeness to unresolved, g<~19.5, UVX quasars from the DR1 quasar catalog. This efficient and complete selection is the result of our application of a probability density type analysis to training sets that describe the 4-D color distribution of stars and spectroscopically confirmed quasars in the SDSS. Specifically, we use a non-parametric Bayesian classification, based on kernel density estimation, to parameterize the color distribution of astronomical sources -- allowing for fast and robust classification. We further supplement the catalog by providing photometric redshifts and matches to FIRST/VLA, ROSAT, and USNO-B sources. Future work needed to extend the this selection algorithm to larger redshifts, fainter magnitudes, and resolved sources is discussed. Finally, we examine some science applications of the catalog, particularly a tentative quasar number counts distribution covering the largest range in magnitude (14.2<g<21.0) ever made within the framework of a single quasar survey.
△ Less
Submitted 26 August, 2004;
originally announced August 2004.
-
Multi-Tree Methods for Statistics on Very Large Datasets in Astronomy
Authors:
Alexander G. Gray,
Andrew W. Moore,
Robert C. Nichol,
Andrew J. Connolly,
Christopher Genovese,
Larry Wasserman
Abstract:
Many fundamental statistical methods have become critical tools for scientific data analysis yet do not scale tractably to modern large datasets. This paper will describe very recent algorithms based on computational geometry which have dramatically reduced the computational complexity of 1) kernel density estimation (which also extends to nonparametric regression, classification, and clustering…
▽ More
Many fundamental statistical methods have become critical tools for scientific data analysis yet do not scale tractably to modern large datasets. This paper will describe very recent algorithms based on computational geometry which have dramatically reduced the computational complexity of 1) kernel density estimation (which also extends to nonparametric regression, classification, and clustering), and 2) the n-point correlation function for arbitrary n. These new multi-tree methods typically yield orders of magnitude in speedup over the previous state of the art for similar accuracy, making millions of data points tractable on desktop workstations for the first time.
△ Less
Submitted 8 January, 2004;
originally announced January 2004.
-
Progress in Classical and Quantum Variational Principles
Authors:
C. G. Gray,
G. Karl,
V. A. Novikov
Abstract:
We review the development and practical uses of a generalized Maupertuis least action principle in classical mechanics, in which the action is varied under the constraint of fixed mean energy for the trial trajectory. The original Maupertuis (Euler-Lagrange) principle constrains the energy at every point along the trajectory. The generalized Maupertuis principle is equivalent to Hamilton's princ…
▽ More
We review the development and practical uses of a generalized Maupertuis least action principle in classical mechanics, in which the action is varied under the constraint of fixed mean energy for the trial trajectory. The original Maupertuis (Euler-Lagrange) principle constrains the energy at every point along the trajectory. The generalized Maupertuis principle is equivalent to Hamilton's principle. Reciprocal principles are also derived for both the generalized Maupertuis and the Hamilton principles. The Reciprocal Maupertuis Principle is the classical limit of Schrödinger's variational principle of wave mechanics, and is also very useful to solve practical problems in both classical and semiclassical mechanics, in complete analogy with the quantum Rayleigh-Ritz method. Classical, semiclassical and quantum variational calculations are carried out for a number of systems, and the results are compared. Pedagogical as well as research problems are used as examples, which include nonconservative as well as relativistic systems.
△ Less
Submitted 11 December, 2003;
originally announced December 2003.