-
Summarization from Leaderboards to Practice: Choosing A Representation Backbone and Ensuring Robustness
Authors:
David Demeter,
Oshin Agarwal,
Simon Ben Igeri,
Marko Sterbentz,
Neil Molino,
John M. Conroy,
Ani Nenkova
Abstract:
Academic literature does not give much guidance on how to build the best possible customer-facing summarization system from existing research components. Here we present analyses to inform the selection of a system backbone from popular models; we find that in both automatic and human evaluation, BART performs better than PEGASUS and T5. We also find that when applied cross-domain, summarizers exh…
▽ More
Academic literature does not give much guidance on how to build the best possible customer-facing summarization system from existing research components. Here we present analyses to inform the selection of a system backbone from popular models; we find that in both automatic and human evaluation, BART performs better than PEGASUS and T5. We also find that when applied cross-domain, summarizers exhibit considerably worse performance. At the same time, a system fine-tuned on heterogeneous domains performs well on all domains and will be most suitable for a broad-domain summarizer. Our work highlights the need for heterogeneous domain summarization benchmarks. We find considerable variation in system output that can be captured only with human evaluation and are thus unlikely to be reflected in standard leaderboards with only automatic evaluation.
△ Less
Submitted 18 June, 2023;
originally announced June 2023.
-
Two to Five Truths in Non-Negative Matrix Factorization
Authors:
John M. Conroy,
Neil P Molino,
Brian Baughman,
Rod Gomez,
Ryan Kaliszewski,
Nicholas A. Lines
Abstract:
In this paper, we explore the role of matrix scaling on a matrix of counts when building a topic model using non-negative matrix factorization. We present a scaling inspired by the normalized Laplacian (NL) for graphs that can greatly improve the quality of a non-negative matrix factorization. The results parallel those in the spectral graph clustering work of \cite{Priebe:2019}, where the authors…
▽ More
In this paper, we explore the role of matrix scaling on a matrix of counts when building a topic model using non-negative matrix factorization. We present a scaling inspired by the normalized Laplacian (NL) for graphs that can greatly improve the quality of a non-negative matrix factorization. The results parallel those in the spectral graph clustering work of \cite{Priebe:2019}, where the authors proved adjacency spectral embedding (ASE) spectral clustering was more likely to discover core-periphery partitions and Laplacian Spectral Embedding (LSE) was more likely to discover affinity partitions. In text analysis non-negative matrix factorization (NMF) is typically used on a matrix of co-occurrence ``contexts'' and ``terms" counts. The matrix scaling inspired by LSE gives significant improvement for text topic models in a variety of datasets. We illustrate the dramatic difference a matrix scalings in NMF can greatly improve the quality of a topic model on three datasets where human annotation is available. Using the adjusted Rand index (ARI), a measure cluster similarity we see an increase of 50\% for Twitter data and over 200\% for a newsgroup dataset versus using counts, which is the analogue of ASE. For clean data, such as those from the Document Understanding Conference, NL gives over 40\% improvement over ASE. We conclude with some analysis of this phenomenon and some connections of this scaling with other matrix scaling methods.
△ Less
Submitted 5 September, 2023; v1 submitted 6 May, 2023;
originally announced May 2023.
-
Structure Calculations Without Effective Interactions
Authors:
J. A. Secrest,
J. M. Conroy,
H. G. Miller
Abstract:
Good approximate eigenstates of a Hamiltionian operator which poesses a point as well as a continuous spectrum have beeen obtained using the Lanczos algorithm. Iterating with the bare Hamiltonian operator yields spurious solutions which can easily be identified. The rms radius of the ground state eigenvector, for example, is calculated using the bare operator.
Good approximate eigenstates of a Hamiltionian operator which poesses a point as well as a continuous spectrum have beeen obtained using the Lanczos algorithm. Iterating with the bare Hamiltonian operator yields spurious solutions which can easily be identified. The rms radius of the ground state eigenvector, for example, is calculated using the bare operator.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Applying the Maximum Entropy Technique to the Gaussian Dispersion Plume Model
Authors:
J. A. Secrest,
J. M. Conroy,
H. G. Miller
Abstract:
The Maximum Entropy (MaxEnt) technique is applied to the derivation of the Gaussian Dispersion Plume Model as well as to more complex transport phenomena such as the one-dimensional advection equation, the one-dimensional diffusion equation, the one dimensional advection-diffusion equation, and finally to the multi-dimensional advection-diffusion equation. Further application is discussed.
The Maximum Entropy (MaxEnt) technique is applied to the derivation of the Gaussian Dispersion Plume Model as well as to more complex transport phenomena such as the one-dimensional advection equation, the one-dimensional diffusion equation, the one dimensional advection-diffusion equation, and finally to the multi-dimensional advection-diffusion equation. Further application is discussed.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
A Unified View of Transport Equations
Authors:
J. A. Secrest,
J. M. Conroy,
H. G. Miller
Abstract:
Distribution functions of many static transport equations are found using the Maximum Entropy Principle. The equations of constraint which contain the relevant dynamical information are simply the low-lying moments of the distributions. Systems subject to conservative forces have also been considered.
Distribution functions of many static transport equations are found using the Maximum Entropy Principle. The equations of constraint which contain the relevant dynamical information are simply the low-lying moments of the distributions. Systems subject to conservative forces have also been considered.
△ Less
Submitted 15 June, 2019;
originally announced June 2019.
-
On a 'Two Truths' Phenomenon in Spectral Graph Clustering
Authors:
Carey E. Priebe,
Youngser Park,
Joshua T. Vogelstein,
John M. Conroy,
Vince Lyzinski,
Minh Tang,
Avanti Athreya,
Joshua Cape,
Eric Bridgeford
Abstract:
Clustering is concerned with coherently grou** observations without any explicit concept of true grou**s. Spectral graph clustering - clustering the vertices of a graph based on their spectral embedding - is commonly approached via K-means (or, more generally, Gaussian mixture model) clustering composed with either Laplacian or Adjacency spectral embedding (LSE or ASE). Recent theoretical resu…
▽ More
Clustering is concerned with coherently grou** observations without any explicit concept of true grou**s. Spectral graph clustering - clustering the vertices of a graph based on their spectral embedding - is commonly approached via K-means (or, more generally, Gaussian mixture model) clustering composed with either Laplacian or Adjacency spectral embedding (LSE or ASE). Recent theoretical results provide new understanding of the problem and solutions, and lead us to a 'Two Truths' LSE vs. ASE spectral graph clustering phenomenon convincingly illustrated here via a diffusion MRI connectome data set: the different embedding methods yield different clustering results, with LSE capturing left hemisphere/right hemisphere affinity structure and ASE capturing gray matter/white matter core-periphery structure.
△ Less
Submitted 11 February, 2019; v1 submitted 23 August, 2018;
originally announced August 2018.
-
Waxman's Algorithm for non-Hermitian Hamiltonian Operators
Authors:
S. R. Chamberlain,
J. G. Tucker,
J. M. Conroy,
H. G. Miller
Abstract:
An algorithm for finding the bound-state eigenvalues and eigenfunctions of a Hermitian Hamiltonian operator using Green's method, developed by Waxman\cite{W98},has been extended to include non-Hermitian Hamiltonian operators.
An algorithm for finding the bound-state eigenvalues and eigenfunctions of a Hermitian Hamiltonian operator using Green's method, developed by Waxman\cite{W98},has been extended to include non-Hermitian Hamiltonian operators.
△ Less
Submitted 22 October, 2017;
originally announced October 2017.
-
MAXENT and the Tsallis Parameter
Authors:
J. M. Conroy,
H. G. Miller
Abstract:
The nonextensive entropic measure proposed by Tsallis introduces a parameter, q, which is not defined but rather must be determined. The value of q is typically determined from a piece of data and then fixed over the range of interest. On the other hand, from a phenomenological viewpoint, there are instances in which q cannot be treated as a constant.
We present two distinct approaches for deter…
▽ More
The nonextensive entropic measure proposed by Tsallis introduces a parameter, q, which is not defined but rather must be determined. The value of q is typically determined from a piece of data and then fixed over the range of interest. On the other hand, from a phenomenological viewpoint, there are instances in which q cannot be treated as a constant.
We present two distinct approaches for determining q depending on the form of the equations of constraint for the particular system. In the first case the equations of constraint for an operator O can be written as $Tr[F^{q}O]=C$, where C may be an explicit function of the distribution function, F. In this case one can solve an equivalent MAXENT problem which yields q as a function of the corresponding Lagrange Multiplier. As an illustration the exact solutions to the static Generalized Fokker-Planck Equation (GFP) are obtained from MAXENT. As in the case where C is a constant if q is treated as a variable within the MAXENT framework, the entropic measure is maximized for all values of q trivially. Therefore q must be determined from existing data. In the second case an additional equation of constraint exists which cannot be brought into the above form. In this case the additional equation of constraint may be used to determine the fixed value of q.
△ Less
Submitted 7 August, 2014;
originally announced August 2014.
-
The Tsallis Parameter
Authors:
J. M. Conroy,
H. G. Miller
Abstract:
The exact solution of a particular form of the stationary state generalized Fokker-Planck equations, which is given under certain conditions by the classical Tsallis distribution, is compared with the solution of the MAXENT equations obtained using the classical Tsallis entropy. The solutions only agree provided the Tsallis parameter, q, is no longer taken to be constant.
The exact solution of a particular form of the stationary state generalized Fokker-Planck equations, which is given under certain conditions by the classical Tsallis distribution, is compared with the solution of the MAXENT equations obtained using the classical Tsallis entropy. The solutions only agree provided the Tsallis parameter, q, is no longer taken to be constant.
△ Less
Submitted 31 January, 2013;
originally announced January 2013.
-
Fast Approximate Quadratic Programming for Large (Brain) Graph Matching
Authors:
Joshua T. Vogelstein,
John M. Conroy,
Vince Lyzinski,
Louis J. Podrazik,
Steven G. Kratzer,
Eric T. Harley,
Donniell E. Fishkind,
R. Jacob Vogelstein,
Carey E. Priebe
Abstract:
Quadratic assignment problems (QAPs) arise in a wide variety of domains, ranging from operations research to graph theory to computer vision to neuroscience. In the age of big data, graph valued data is becoming more prominent, and with it, a desire to run algorithms on ever larger graphs. Because QAP is NP-hard, exact algorithms are intractable. Approximate algorithms necessarily employ an accura…
▽ More
Quadratic assignment problems (QAPs) arise in a wide variety of domains, ranging from operations research to graph theory to computer vision to neuroscience. In the age of big data, graph valued data is becoming more prominent, and with it, a desire to run algorithms on ever larger graphs. Because QAP is NP-hard, exact algorithms are intractable. Approximate algorithms necessarily employ an accuracy/efficiency trade-off. We developed a fast approximate quadratic assignment algorithm (FAQ). FAQ finds a local optima in (worst case) time cubic in the number of vertices, similar to other approximate QAP algorithms. We demonstrate empirically that our algorithm is faster and achieves a lower objective value on over 80% of the suite of QAP benchmarks, compared with the previous state-of-the-art. Applying the algorithms to our motivating example, matching C. elegans connectomes (brain-graphs), we find that FAQ achieves the optimal performance in record time, whereas none of the others even find the optimum.
△ Less
Submitted 13 September, 2014; v1 submitted 22 December, 2011;
originally announced December 2011.
-
The Rayleigh Quotient
Authors:
M. K. G. Kruse,
J. M. Conroy,
H. G. Miller
Abstract:
The central role of the Rayleigh quotient in many body physics is discussed. Various many body methods can be obtained from either an attempt to evaluate the Rayleigh Quotient directly or through various variational approximations. Rather than dwell on the technical details necessary to obtain the equations of the various many body methods, we concentrate on how they can be obtained from the Rayle…
▽ More
The central role of the Rayleigh quotient in many body physics is discussed. Various many body methods can be obtained from either an attempt to evaluate the Rayleigh Quotient directly or through various variational approximations. Rather than dwell on the technical details necessary to obtain the equations of the various many body methods, we concentrate on how they can be obtained from the Rayleigh Quotient, and some of the consequences of the approximations involved in their evaluation.
△ Less
Submitted 1 December, 2011;
originally announced December 2011.
-
Thermodynamic Consistency of the $q$-Deformed Fermi-Dirac Distribution in Nonextensive Thermostatics
Authors:
J. M. Conroy,
H. G. Miller,
A. R. Plastino
Abstract:
The $q$-deformed statistics for fermions arising within the non-extensive thermostatistical formalism has been applied to the study of various quantum many-body systems recently. The aim of the present note is to point out some subtle difficulties presented by this approach in connection with the problem of thermodynamic consistency. Different possible ways to apply the $q$-deformed quantum distri…
▽ More
The $q$-deformed statistics for fermions arising within the non-extensive thermostatistical formalism has been applied to the study of various quantum many-body systems recently. The aim of the present note is to point out some subtle difficulties presented by this approach in connection with the problem of thermodynamic consistency. Different possible ways to apply the $q$-deformed quantum distributions in a thermodynamically consistent way are considered.
△ Less
Submitted 20 June, 2010;
originally announced June 2010.
-
Color Superconductivity and Tsallis Statistics
Authors:
Justin M. Conroy,
H. G. Miller
Abstract:
The generalized non-extensive statistics proposed by Tsallis have been successfully utilized in many systems where long range interactions are present. For high density quark matter an attractive long range interaction arising from single gluon exchange suggests the formation of a diquark condensate. We study the effects on this color superconducting phase for two quark flavors due to a change t…
▽ More
The generalized non-extensive statistics proposed by Tsallis have been successfully utilized in many systems where long range interactions are present. For high density quark matter an attractive long range interaction arising from single gluon exchange suggests the formation of a diquark condensate. We study the effects on this color superconducting phase for two quark flavors due to a change to Tsallis statistics. By numerically solving the gap equation we obtain a generalization of the universality condition, $\frac{2φ_{0}}{T_{C}}\approx 3.52$ and determine the temperature dependence of the gap. For the Tsallis parameter $q\approx 1$ the specific heat is exponential becoming more linear as q increases. This suggests that for larger values of q s-wave color superconductors behave like high $T_c$ superconductors rather than weak superconductors.
△ Less
Submitted 2 January, 2008;
originally announced January 2008.
-
Five-dimensional Trinification Improved
Authors:
Christopher D. Carone,
Justin M. Conroy
Abstract:
We present improved models of trinification in five dimensions. Unified symmetry is broken by a combination of orbifold projections and a boundary Higgs sector. The latter can be decoupled from the theory, realizing a Higgsless limit in which the scale of exotic massive gauge fields is set by the compactification radius. Electroweak Higgs doublets are identified with the fifth components of gaug…
▽ More
We present improved models of trinification in five dimensions. Unified symmetry is broken by a combination of orbifold projections and a boundary Higgs sector. The latter can be decoupled from the theory, realizing a Higgsless limit in which the scale of exotic massive gauge fields is set by the compactification radius. Electroweak Higgs doublets are identified with the fifth components of gauge fields and Yukawa interactions arise via Wilson loops. The result is a simple low-energy effective theory that is consistent with the constraints from proton decay and gauge unification.
△ Less
Submitted 29 July, 2005; v1 submitted 25 July, 2005;
originally announced July 2005.
-
Higgsless GUT Breaking and Trinification
Authors:
Christopher D. Carone,
Justin M. Conroy
Abstract:
Boundary conditions on an extra-dimensional interval can be chosen to break bulk gauge symmetries and to reduce the rank of the gauge group. We consider this mechanism in models with gauge trinification. We determine the boundary conditions necessary to break the trinified gauge group directly down to that of the standard model. Working in an effective theory for the gauge symmetry-breaking para…
▽ More
Boundary conditions on an extra-dimensional interval can be chosen to break bulk gauge symmetries and to reduce the rank of the gauge group. We consider this mechanism in models with gauge trinification. We determine the boundary conditions necessary to break the trinified gauge group directly down to that of the standard model. Working in an effective theory for the gauge symmetry-breaking parameters on a boundary, we examine the limit in which the GUT-breaking sector is Higgsless and show how one may obtain the low-energy particle content of the minimal supersymmetric standard model. We find that gauge unification is preserved in this scenario, and that the differential gauge coupling running is logarithmic above the scale of compactification. We compare the phenomenology of our model to that of four-dimensional trinified theories.
△ Less
Submitted 24 September, 2004; v1 submitted 9 July, 2004;
originally announced July 2004.
-
Universal Extra Dimensions and Kaluza Klein Bound States
Authors:
Christopher D. Carone,
Justin M. Conroy,
Marc Sher,
Ismail Turan
Abstract:
We study the bound states of the Kaluza-Klein (KK) excitations of quarks in certain models of Universal Extra Dimensions. Such bound states may be detected at future lepton colliders in the cross section for the pair production of KK-quarks near threshold. For typical values of model parameters, we find that "KK-quarkonia" have widths in the 10 - 100 MeV range, and production cross sections of o…
▽ More
We study the bound states of the Kaluza-Klein (KK) excitations of quarks in certain models of Universal Extra Dimensions. Such bound states may be detected at future lepton colliders in the cross section for the pair production of KK-quarks near threshold. For typical values of model parameters, we find that "KK-quarkonia" have widths in the 10 - 100 MeV range, and production cross sections of order a few picobarns for the lightest resonances. Two body decays of the constituent KK-quarks lead to distinctive experimental signatures. We point out that such KK resonances may be discovered before any of the higher KK modes.
△ Less
Submitted 24 February, 2004; v1 submitted 3 December, 2003;
originally announced December 2003.
-
Phenomenology of Lorentz-Conserving Noncommutative QED
Authors:
Justin M. Conroy,
Herry J. Kwee,
Vahagn Nazaryan
Abstract:
Recently a version of Lorentz-conserving noncommutative field theory (NCFT) has been suggested. The underlying Lie algebra of the theory is the same as that of Doplicher, Fredenhagen, and Roberts. In Lorentz-conserving NCFT the matrix parameter {theta}^{mu nu} which characterizes the canonical NCFT's is promoted to an operator {theta hat}^{mu nu} that transforms as a Lorentz tensor. In this pape…
▽ More
Recently a version of Lorentz-conserving noncommutative field theory (NCFT) has been suggested. The underlying Lie algebra of the theory is the same as that of Doplicher, Fredenhagen, and Roberts. In Lorentz-conserving NCFT the matrix parameter {theta}^{mu nu} which characterizes the canonical NCFT's is promoted to an operator {theta hat}^{mu nu} that transforms as a Lorentz tensor. In this paper, we calculate phenomenological consequences of the QED version of this theory by looking at various collider processes. In particular we calculate modifications to Moller scattering, Bhabha scattering, e^+e^- --> mu^+ mu^- and e^+e^- --> gamma gamma. We obtain bounds on the noncommutativity scale from the existing experiments at LEP and make predictions for what may be seen in future collider experiments.
△ Less
Submitted 20 May, 2003;
originally announced May 2003.
-
Bulk Majorons at Colliders
Authors:
Christopher D. Carone,
Justin M. Conroy,
Herry J. Kwee
Abstract:
Lepton number violation may arise via the spontaneous breakdown of a global symmetry. In extra dimensions, spontaneous lepton number violation in the bulk implies the existence of a Goldstone boson, the majoron J^(0), as well as an accompanying tower of Kaluza-Klein (KK) excitations, J^(n). Even if the zero-mode majoron is very weakly interacting, so that detection in low-energy processes is dif…
▽ More
Lepton number violation may arise via the spontaneous breakdown of a global symmetry. In extra dimensions, spontaneous lepton number violation in the bulk implies the existence of a Goldstone boson, the majoron J^(0), as well as an accompanying tower of Kaluza-Klein (KK) excitations, J^(n). Even if the zero-mode majoron is very weakly interacting, so that detection in low-energy processes is difficult, the sum over the tower of KK modes may partially compensate in processes of relevance at high-energy colliders. Here we consider the inclusive differential and total cross sections for e^- e^- --> W^- W^- J, where J represents a sum over KK modes. We show that allowed parameter choices exist for which this process may be accessible to a TeV-scale electron collider.
△ Less
Submitted 8 April, 2002; v1 submitted 3 April, 2002;
originally announced April 2002.