-
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap
Authors:
Saurabh Srivastava,
Annarose M B,
Anto P V,
Shashank Menon,
Ajay Sukumar,
Adwaith Samod T,
Alan Philipose,
Stevin Prince,
Sooraj Thomas
Abstract:
We propose a framework for robust evaluation of reasoning capabilities of language models, using functional variants of benchmarks. Models that solve a reasoning test should exhibit no difference in performance over the static version of a problem compared to a snapshot of the functional variant. We have rewritten the relevant fragment of the MATH benchmark into its functional variant MATH(), with…
▽ More
We propose a framework for robust evaluation of reasoning capabilities of language models, using functional variants of benchmarks. Models that solve a reasoning test should exhibit no difference in performance over the static version of a problem compared to a snapshot of the functional variant. We have rewritten the relevant fragment of the MATH benchmark into its functional variant MATH(), with functionalization of other benchmarks to follow. When evaluating current state-of-the-art models over snapshots of MATH(), we find a reasoning gap -- the percentage difference between the static and functional accuracies. We find reasoning gaps from 58.35% to 80.31% among the state-of-the-art closed and open weights models that perform well on static benchmarks, with the caveat that the gaps are likely to be smaller with more sophisticated prompting strategies. Here we show that models which anecdotally have good reasoning performance over real-world tasks, have quantifiable lower gaps, motivating the open problem of building "gap 0" models. Code for evaluation and new evaluation datasets, three MATH() snapshots, are publicly available at https://github.com/consequentai/fneval/.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Interoperable Workflows by Exchanging Grid-Based Data between Quantum-Chemical Program Packages
Authors:
Kevin Focke,
Matteo De Santis,
Mario Wolter,
Jessica A. Martinez B,
Valérie Vallet,
André Severo Pereira Gomes,
Małgorzata Olejniczak,
Christoph R. Jacob
Abstract:
Quantum-chemical subsystem and embedding methods require complex workflows that may involve multiple quantum-chemical program packages. Moreover, such workflows require the exchange of voluminous data that goes beyond simple quantities such as molecular structures and energies. Here, we describe our approach for addressing this interoperability challenge by exchanging electron densities and embedd…
▽ More
Quantum-chemical subsystem and embedding methods require complex workflows that may involve multiple quantum-chemical program packages. Moreover, such workflows require the exchange of voluminous data that goes beyond simple quantities such as molecular structures and energies. Here, we describe our approach for addressing this interoperability challenge by exchanging electron densities and embedding potentials as grid-based data. We describe the approach that we have implemented to this end in a dedicated code, PyEmbed, currently part of a Python scripting framework. We discuss how it has facilitated the development of quantum-chemical subsystem and embedding methods, and highlight several applications that have been enabled by PyEmbed, including WFT-in-DFT embedding schemes mixing non-relativistic and relativistic electronic structure methods, real-time time-dependent DFT-in-DFT approaches, the density-based many-body expansion, and workflows including real-space data analysis and visualization. Our approach demonstrates in particular the merits of exchanging (complex) grid-based data, and in general the potential of modular software development in quantum chemistry, which hinges upon libraries that facilitate interoperability.
△ Less
Submitted 29 March, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Solvation effects on halides core spectra with Multilevel Real-Time quantum embedding
Authors:
Jessica A. Martinez B.,
Matteo De Santis,
Michele Pavanello,
Valérie Vallet,
André Severo Pereira Gomes
Abstract:
In this work we introduce a novel subsystem-based electronic structure embedding method that combines the projection-based block-orthogonalized Manby-Miller embedding (BOMME) with the density-based Frozen Density Embedding (FDE) methods. Our approach is effective for systems in which the building blocks interact at varying strengths while still maintaining a lower computational cost compared to a…
▽ More
In this work we introduce a novel subsystem-based electronic structure embedding method that combines the projection-based block-orthogonalized Manby-Miller embedding (BOMME) with the density-based Frozen Density Embedding (FDE) methods. Our approach is effective for systems in which the building blocks interact at varying strengths while still maintaining a lower computational cost compared to a quantum simulation of the entire system. To evaluate the performance of our method, we assess its ability to reproduce the X-ray absorption spectra (XAS) of chloride and fluoride anions in aqueous solutions (based on a 50-water droplet model) via real-time time-dependent density functional theory (rt-TDDFT) calculations. We employ an ensemble approach to compute XAS for the K- and L-edges, utilizing multiple snapshots of configuration space obtained from classical molecular dynamics simulations with a polarizable force field. Configurational averaging influences both the broadening of spectral features and their intensities, with contributions to the final intensities originating from different geometry configurations. We found that embedding models that are too approximate for halide-water specific interactions, as in the case of FDE, fail to reproduce the experimental spectrum for chloride. Meanwhile, BOMME tends to overestimate intensities, particularly for higher energy features because of finite-size effects. Combining FDE for the second solvation shell and retaining BOMME for the first solvation shell mitigates this effect, resulting in an overall improved agreement within the energy range of the experimental spectrum. Additionally, we compute the transition densities of the relevant transitions, confirming that these transitions occur within the halide systems. Thus, our real-time QM/QM/QM embedding method proves to be a promising approach for modeling XAS of solvated systems.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Probing interlayer interactions and commensurate-incommensurate transition in twisted bilayer graphene through Raman spectroscopy
Authors:
Vineet Pandey,
Subhendu Mishra,
Nikhilesh Maity,
Sourav Paul,
Abhijith M B,
Ajit Roy,
Nicholas R Glavin,
Kenji Watanabe,
Takashi Taniguchi,
Abhishek Kumar Singh,
Vidya Kochat
Abstract:
Twisted 2D layered materials have garnered a lot of attention recently as a class of 2D materials whose interlayer interactions and electronic properties are dictated by the relative rotation / twist angle between the adjacent layers. In this work, we explore a prototype of such a twisted 2D system, artificially stacked twisted bilayer graphene (TBLG), where we probe the changes in the interlayer…
▽ More
Twisted 2D layered materials have garnered a lot of attention recently as a class of 2D materials whose interlayer interactions and electronic properties are dictated by the relative rotation / twist angle between the adjacent layers. In this work, we explore a prototype of such a twisted 2D system, artificially stacked twisted bilayer graphene (TBLG), where we probe the changes in the interlayer interactions and electron-phonon scattering pathways as the twist angle is varied from 0° to 30°, using Raman spectroscopy. The long range Moiré potential of the superlattice gives rise to additional intravalley and intervalley scattering of the electrons in TBLG which have been investigated through their Raman signatures. The density functional theory (DFT) calculations of the electronic band structure of the TBLG superlattices was found to be in agreement with the resonant Raman excitations across the van Hove singularities in the valence and conduction bands predicted for TBLG due to hybridization of bands from the two layers. We also observe that the relative rotation between the graphene layers has a marked influence on the second order overtone and combination Raman modes signalling a commensurate-incommensurate transition in TBLG as the twist angle increases. This serves as a convenient and rapid characterization tool to determine the degree of commensurability in TBLG systems.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Finiteness of homoclinic classes on sectional hyperbolic sets
Authors:
A. M. López B,
A. E. Arbieto
Abstract:
We study small perturbations of a sectional hyperbolic set of a vector field on a compact manifold. Indeed, we obtain robustly finiteness of homoclinic classes on this scenary. Moreover, since attractor and repeller sets are particular cases of homoclinic classes, this result improve (A. M. López B, Finiteness and existence of attractors and repellers on sectional hyperbolic sets, Discrete and Con…
▽ More
We study small perturbations of a sectional hyperbolic set of a vector field on a compact manifold. Indeed, we obtain robustly finiteness of homoclinic classes on this scenary. Moreover, since attractor and repeller sets are particular cases of homoclinic classes, this result improve (A. M. López B, Finiteness and existence of attractors and repellers on sectional hyperbolic sets, Discrete and Continuous Dynamical Systems-A 37).
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
A machine learning approach to anomaly-based detection on Android platforms
Authors:
Joshua Abah,
Waziri O. V,
Abdullahi M. B,
Arthur U. M,
Adewale O. S
Abstract:
The emergence of mobile platforms with increased storage and computing capabilities and the pervasive use of these platforms for sensitive applications such as online banking, e-commerce and the storage of sensitive information on these mobile devices have led to increasing danger associated with malware targeted at these devices. Detecting such malware presents inimitable challenges as signature-…
▽ More
The emergence of mobile platforms with increased storage and computing capabilities and the pervasive use of these platforms for sensitive applications such as online banking, e-commerce and the storage of sensitive information on these mobile devices have led to increasing danger associated with malware targeted at these devices. Detecting such malware presents inimitable challenges as signature-based detection techniques available today are becoming inefficient in detecting new and unknown malware. In this research, a machine learning approach for the detection of malware on Android platforms is presented. The detection system monitors and extracts features from the applications while in execution and uses them to perform in-device detection using a trained K-Nearest Neighbour classifier. Results shows high performance in the detection rate of the classifier with accuracy of 93.75%, low error rate of 6.25% and low false positive rate with ability of detecting real Android malware.
△ Less
Submitted 13 December, 2015;
originally announced December 2015.
-
Homoclinic classes for sectional-hyperbolic sets
Authors:
A. Arbieto,
C. A. Morales,
A. M. Lopez B
Abstract:
We prove that every sectional-hyperbolic Lyapunov stable set contains a nontrivial homoclinic class.
We prove that every sectional-hyperbolic Lyapunov stable set contains a nontrivial homoclinic class.
△ Less
Submitted 18 August, 2014;
originally announced August 2014.
-
Cosmology and thermodynamics of FRW universe with bulk viscous stiff fluid
Authors:
Titus K Mathew,
Aswathy M B,
Manoj M
Abstract:
We consider a cosmological model dominated by stiff fluid with a constant bulk viscosity. We classify all the possible cases of the universe predicted by the model and analyzing the scale factor, density as well as the curvature scalar. We find that when the dimensionless constant bulk viscous parameter is in the range $0 < \barζ<6$ the model began with a Big Bang, and make a transition form the d…
▽ More
We consider a cosmological model dominated by stiff fluid with a constant bulk viscosity. We classify all the possible cases of the universe predicted by the model and analyzing the scale factor, density as well as the curvature scalar. We find that when the dimensionless constant bulk viscous parameter is in the range $0 < \barζ<6$ the model began with a Big Bang, and make a transition form the decelerating expansion epoch to an accelerating epoch, then tends to the de Sitter phase as $ t\to \infty$. The transition into the accelerating epoch would be in the recent past, when $4<\barζ<6.$ For $\barζ>6$ the model doesn't have a Big Bang and suffered an increase in the fluid density and scalar curvature as the universe expands, which are eventually saturates as the scale factor $a \to \infty$ in the future. We have analyzed the model with statefinder diagnostics and find that the model is different from $Λ$CDM model but approaches $Λ$CDM point as $a \to \infty.$ We have also analyzed the status of the generalized second law of thermodynamics with apparent horizon as the boundary of the universe and found that the law is generally satisfied when $0 \leq \barζ<6$ and for $\barζ>6$ the law is satisfied when the scale factor is larger than a minimum value.
△ Less
Submitted 9 June, 2014;
originally announced June 2014.
-
The extended star formation history of the star cluster NGC 2154 in the Large Magellanic Cloud
Authors:
Gustavo Baume,
Giovanni Carraro,
Edgardo Costa,
Rene' A. Mendez B.,
Leo Girardi
Abstract:
The color-magnitude diagram (CMD) of the intermediate-age Large Magellanic Cloud (LMC) star cluster NGC 2154 and its adjacent field, has been analyzed using Padova stellar models to determine the cluster's fundamental parameters and its Star Formation History (SFH). Deep $BR$ CCD photometry, together with synthetic CMDs and Integrated Luminosity Functions (ILFs), has allowed us to infer that the…
▽ More
The color-magnitude diagram (CMD) of the intermediate-age Large Magellanic Cloud (LMC) star cluster NGC 2154 and its adjacent field, has been analyzed using Padova stellar models to determine the cluster's fundamental parameters and its Star Formation History (SFH). Deep $BR$ CCD photometry, together with synthetic CMDs and Integrated Luminosity Functions (ILFs), has allowed us to infer that the cluster experienced an extended star formation period of about 1.2 Gyrs, which began approximately 2.3 Gyr and ended 1.1 Gyr ago. The physical reality of such a prolonged period of star formation is however questionable, and could be the result of inadequacies in the stellar evolutionary tracks themselves. A substantial fraction of binaries (70%) seems to exist in NGC 2154.
△ Less
Submitted 7 December, 2006;
originally announced December 2006.