Search | arXiv e-print repository

Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling

Authors: Cong Xu, Gayathri Saranathan, Mahammad Parwez Alam, Arpit Shah, James Lim, Soon Yee Wong, Foltin Martin, Suparna Bhattacharya

Abstract: Evaluating LLMs and text-to-image models is a computationally intensive task often overlooked. Efficient evaluation is crucial for understanding the diverse capabilities of these models and enabling comparisons across a growing number of new models and benchmarks. To address this, we introduce SubLIME, a data-efficient evaluation framework that employs adaptive sampling techniques, such as cluster… ▽ More Evaluating LLMs and text-to-image models is a computationally intensive task often overlooked. Efficient evaluation is crucial for understanding the diverse capabilities of these models and enabling comparisons across a growing number of new models and benchmarks. To address this, we introduce SubLIME, a data-efficient evaluation framework that employs adaptive sampling techniques, such as clustering and quality-based methods, to create representative subsets of benchmarks. Our approach ensures statistically aligned model rankings compared to full datasets, evidenced by high Pearson correlation coefficients. Empirical analysis across six NLP benchmarks reveals that: (1) quality-based sampling consistently achieves strong correlations (0.85 to 0.95) with full datasets at a 10\% sampling rate such as Quality SE and Quality CPD (2) clustering methods excel in specific benchmarks such as MMLU (3) no single method universally outperforms others across all metrics. Extending this framework, we leverage the HEIM leaderboard to cover 25 text-to-image models on 17 different benchmarks. SubLIME dynamically selects the optimal technique for each benchmark, significantly reducing evaluation costs while preserving ranking integrity and score distribution. Notably, a minimal sampling rate of 1% proves effective for benchmarks like MMLU. Additionally, we demonstrate that employing difficulty-based sampling to target more challenging benchmark segments enhances model differentiation with broader score distributions. We also combine semantic search, tool use, and GPT-4 review to identify redundancy across benchmarks within specific LLM categories, such as coding benchmarks. This allows us to further reduce the number of samples needed to maintain targeted rank preservation. Overall, SubLIME offers a versatile and cost-effective solution for the robust evaluation of LLMs and text-to-image models. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2404.05571 [pdf, other]

doi 10.1039/D4SM00346B

Wetting on Silicone Surfaces

Authors: Lukas Hauer, Abhinav Naga, Rodrique G. M. Badr, Jonathan T. Pham, William S. Y. Wong, Doris Vollmer

Abstract: Silicone is frequently used as a model system to investigate and tune wetting on soft materials. Silicone is biocompatible and shows excellent thermal, chemical, and UV stability. Moreover, the mechanical properties of the surface can be easily varied by several orders of magnitude in a controlled manner. Polydimethylsiloxane (PDMS) is a popular choice for coating applications such as lubrication,… ▽ More Silicone is frequently used as a model system to investigate and tune wetting on soft materials. Silicone is biocompatible and shows excellent thermal, chemical, and UV stability. Moreover, the mechanical properties of the surface can be easily varied by several orders of magnitude in a controlled manner. Polydimethylsiloxane (PDMS) is a popular choice for coating applications such as lubrication, self-cleaning, and drag reduction, facilitated by low surface energy. Aiming to understand the underlying interactions and forces, motivated numerous and detailed investigations of the static and dynamic wetting behavior of drops on PDMS-based surfaces. Here, we recognize the three most prevalent PDMS surface variants, namely liquid-infused (SLIPS/LIS), elastomeric, and liquid-like (SOCAL) surfaces. To understand, optimize, and tune the wetting properties of these PDMS surfaces, we review and compare their similarities and differences by discussing (i) the chemical and molecular structure, and (ii) the static and dynamic wetting behavior. We also provide (iii) an overview of methods and techniques to characterize PDMS-based surfaces and their wetting behavior. The static and dynamic wetting ridge is given particular attention, as it dominates energy dissipation, adhesion, and friction of sliding drops and influences the durability of the surfaces. We also discuss special features such as cloaking and wetting-induced phase separation. Key challenges and opportunities of these three surface variants are outlined. △ Less

Submitted 1 July, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2402.14476 [pdf, ps, other]

Quantifying neural network uncertainty under volatility clustering

Authors: Steven Y. K. Wong, Jennifer S. K. Chan, Lamiae Azizi

Abstract: Time-series with time-varying variance pose a unique challenge to uncertainty quantification (UQ) methods. Time-varying variance, such as volatility clustering as seen in financial time-series, can lead to large mismatch between predicted uncertainty and forecast error. Building on recent advances in neural network UQ literature, we extend and simplify Deep Evidential Regression and Deep Ensembles… ▽ More Time-series with time-varying variance pose a unique challenge to uncertainty quantification (UQ) methods. Time-varying variance, such as volatility clustering as seen in financial time-series, can lead to large mismatch between predicted uncertainty and forecast error. Building on recent advances in neural network UQ literature, we extend and simplify Deep Evidential Regression and Deep Ensembles into a unified framework to deal with UQ under the presence of volatility clustering. We show that a Scale Mixture Distribution is a simpler alternative to the Normal-Inverse-Gamma prior that provides favorable complexity-accuracy trade-off. To illustrate the performance of our proposed approach, we apply it to two sets of financial time-series exhibiting volatility clustering: cryptocurrencies and U.S. equities. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 38 pages

arXiv:2208.06519 [pdf, other]

doi 10.1103/PhysRevLett.129.261801

One-Electron Quantum Cyclotron as a Milli-eV Dark-Photon Detector

Authors: Xing Fan, Gerald Gabrielse, Peter W. Graham, Roni Harnik, Thomas G. Myers, Harikrishnan Ramani, Benedict A. D. Sukra, Samuel S. Y. Wong, Yawen Xiao

Abstract: We propose using trapped electrons as high-$Q$ resonators for detecting meV dark photon dark matter. When the rest energy of the dark photon matches the energy splitting of the two lowest cyclotron levels, the first excited state of the electron cyclotron will be resonantly excited. A proof-of-principle measurement, carried out with one electron, demonstrates that the method is background-free ove… ▽ More We propose using trapped electrons as high-$Q$ resonators for detecting meV dark photon dark matter. When the rest energy of the dark photon matches the energy splitting of the two lowest cyclotron levels, the first excited state of the electron cyclotron will be resonantly excited. A proof-of-principle measurement, carried out with one electron, demonstrates that the method is background-free over a 7.4 day search. It sets a limit on dark photon dark matter at 148 GHz (0.6 meV) that is around 75 times better than previous constraints. Dark photon dark matter in the 0.1-1 meV mass range (20-200 GHz) could likely be detected at a similar sensitivity in an apparatus designed for dark photon detection. △ Less

Submitted 9 January, 2023; v1 submitted 12 August, 2022; originally announced August 2022.

Comments: 6 pages, 5 figures

Journal ref: Phys. Rev. Lett. 129, 261801 (2022)

arXiv:2202.13369 [pdf, other]

Robust Continual Learning through a Comprehensively Progressive Bayesian Neural Network

Authors: Guo Yang, Cheryl Sze Yin Wong, Ramasamy Savitha

Abstract: This work proposes a comprehensively progressive Bayesian neural network for robust continual learning of a sequence of tasks. A Bayesian neural network is progressively pruned and grown such that there are sufficient network resources to represent a sequence of tasks, while the network does not explode. It starts with the contention that similar tasks should have the same number of total network… ▽ More This work proposes a comprehensively progressive Bayesian neural network for robust continual learning of a sequence of tasks. A Bayesian neural network is progressively pruned and grown such that there are sufficient network resources to represent a sequence of tasks, while the network does not explode. It starts with the contention that similar tasks should have the same number of total network resources, to ensure fair representation of all tasks in a continual learning scenario. Thus, as the data for new task streams in, sufficient neurons are added to the network such that the total number of neurons in each layer of the network, including the shared representations with previous tasks and individual task related representation, are equal for all tasks. The weights that are redundant at the end of training each task are also pruned through re-initialization, in order to be efficiently utilized in the subsequent task. Thus, the network grows progressively, but ensures effective utilization of network resources. We refer to our proposed method as 'Robust Continual Learning through a Comprehensively Progressive Bayesian Neural Network (RCL-CPB)' and evaluate the proposed approach on the MNIST data set, under three different continual learning scenarios. Further to this, we evaluate the performance of RCL-CPB on a homogeneous sequence of tasks using split CIFAR100 (20 tasks of 5 classes each), and a heterogeneous sequence of tasks using MNIST, SVHN and CIFAR10 data sets. The demonstrations and the performance results show that the proposed strategies for progressive BNN enable robust continual learning. △ Less

Submitted 27 February, 2022; originally announced February 2022.

arXiv:2202.03948

Adaptive two capacitor model to describe slide electrification in moving water drops

Authors: Pravash Bista, Amy Z. Stetten, William S. Y Wong, Hans-Jürgen Butt, Stefan A. L. Weber

Abstract: Slide electrification is a spontaneous charge separation between a surface and a sliding drop. Here, we describe this effect in terms of a voltage generated at the three-phase contact line. This voltage moves charges between capacitors, one formed by the drop and one on the surface. By introducing an adaptation of the voltage upon water contact, we can model drop charge experiments on many surface… ▽ More Slide electrification is a spontaneous charge separation between a surface and a sliding drop. Here, we describe this effect in terms of a voltage generated at the three-phase contact line. This voltage moves charges between capacitors, one formed by the drop and one on the surface. By introducing an adaptation of the voltage upon water contact, we can model drop charge experiments on many surfaces, including more exotic ones with drop-rate dependent charge polarity. Thus, the adaptive two capacitor model enables new insights into the molecular details of the charge separation mechanism. △ Less

Submitted 26 February, 2024; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: The experimental results are accurate, but the physical model proposed in the paper does not fully capture the underlying physics behind the data. We are actively working to develop an improved model that better describes the experimental observations

arXiv:2201.06941 [pdf, other]

Incremental Knowledge Tracing from Multiple Schools

Authors: Sujanya Suresh, Savitha Ramasamy, P. N. Suganthan, Cheryl Sze Yin Wong

Abstract: Knowledge tracing is the task of predicting a learner's future performance based on the history of the learner's performance. Current knowledge tracing models are built based on an extensive set of data that are collected from multiple schools. However, it is impossible to pool learner's data from all schools, due to data privacy and PDPA policies. Hence, this paper explores the feasibility of bui… ▽ More Knowledge tracing is the task of predicting a learner's future performance based on the history of the learner's performance. Current knowledge tracing models are built based on an extensive set of data that are collected from multiple schools. However, it is impossible to pool learner's data from all schools, due to data privacy and PDPA policies. Hence, this paper explores the feasibility of building knowledge tracing models while preserving the privacy of learners' data within their respective schools. This study is conducted using part of the ASSISTment 2009 dataset, with data from multiple schools being treated as separate tasks in a continual learning framework. The results show that learning sequentially with the Self Attentive Knowledge Tracing (SAKT) algorithm is able to achieve considerably similar performance to that of pooling all the data together. △ Less

Submitted 7 January, 2022; originally announced January 2022.

Comments: In AAAI22 AI4EDU Workshop

arXiv:2010.04330 [pdf, other]

doi 10.1007/JHEP01(2021)044

Confinement on $\mathbb{R}^3 \times \mathbb{S}^1$ and Double-String Collapse

Authors: Mathew W. Bub, Erich Poppitz, Samuel S. Y. Wong

Abstract: We study confining strings in ${\cal{N}}=1$ supersymmetric $SU(N_c)$ Yang-Mills theory in the semiclassical regime on $\mathbb{R}^{1,2} \times \mathbb{S}^1$. Static quarks are expected to be confined by double strings composed of two domain walls - which are lines in $\mathbb{R}^2$ - rather than by a single flux tube. Each domain wall carries part of the quarks' chromoelectric flux. We numerically… ▽ More We study confining strings in ${\cal{N}}=1$ supersymmetric $SU(N_c)$ Yang-Mills theory in the semiclassical regime on $\mathbb{R}^{1,2} \times \mathbb{S}^1$. Static quarks are expected to be confined by double strings composed of two domain walls - which are lines in $\mathbb{R}^2$ - rather than by a single flux tube. Each domain wall carries part of the quarks' chromoelectric flux. We numerically study this mechanism and find that double-string confinement holds for strings of all $N$-alities, except for those between fundamental quarks. We show that, for $N_c \ge 5$, the two domain walls confining unit $N$-ality quarks attract and form non-BPS bound states, collapsing to a single flux line. We determine the $N$-ality dependence of the string tensions for $2 \le N_c \le 10$. Compared to known scaling laws, we find a weaker, almost flat $N$-ality dependence, which is qualitatively explained by the properties of BPS domain walls. We also quantitatively study the behavior of confining strings upon increasing the $\mathbb{S}^1$ size by including the effect of virtual "$W$-bosons" and show that the qualitative features of double-string confinement persist. △ Less

Submitted 23 December, 2020; v1 submitted 8 October, 2020; originally announced October 2020.

Comments: 53 pages, 23 figures. Version to appear in JHEP. Added references, updated the title, and added a section explaining the differences from confinement in the Polyakov model

Journal ref: J. High Energ. Phys. 2021, 44 (2021)

arXiv:2009.09339 [pdf, other]

On Certificate Management in Named Data Networking

Authors: Zhiyi Zhang, Su Yong Wong, Junxiao Shi, Davide Pesavento, Alexander Afanasyev, Lixia Zhang

Abstract: Named Data Networking (NDN) secures network communications by requiring all data packets to be signed when produced. This requirement necessitates efficient and usable mechanisms to handle NDN certificate issuance and revocation, making these supporting mechanisms essential for NDN operations. In this paper, we first investigate and clarify core concepts related to NDN certificates and security de… ▽ More Named Data Networking (NDN) secures network communications by requiring all data packets to be signed when produced. This requirement necessitates efficient and usable mechanisms to handle NDN certificate issuance and revocation, making these supporting mechanisms essential for NDN operations. In this paper, we first investigate and clarify core concepts related to NDN certificates and security design in general, and then present the model of NDN certificate management and its desired properties. We proceed with the design of a specific realization of NDN's certificate management, NDNCERT, evaluate it using a formal security analysis, and discuss the challenges in designing, implementing, and deploying the system, to share our experiences with other NDN security protocol development efforts. △ Less

Submitted 19 September, 2020; originally announced September 2020.

arXiv:2003.02515 [pdf, other]

Time-varying neural network for stock return prediction

Authors: Steven Y. K. Wong, Jennifer Chan, Lamiae Azizi, Richard Y. D. Xu

Abstract: We consider the problem of neural network training in a time-varying context. Machine learning algorithms have excelled in problems that do not change over time. However, problems encountered in financial markets are often time-varying. We propose the online early stop** algorithm and show that a neural network trained using this algorithm can track a function changing with unknown dynamics. We… ▽ More We consider the problem of neural network training in a time-varying context. Machine learning algorithms have excelled in problems that do not change over time. However, problems encountered in financial markets are often time-varying. We propose the online early stop** algorithm and show that a neural network trained using this algorithm can track a function changing with unknown dynamics. We compare the proposed algorithm to current approaches on predicting monthly U.S. stock returns and show its superiority. We also show that prominent factors (such as the size and momentum effects) and industry indicators, exhibit time varying stock return predictiveness. We find that during market distress, industry indicators experience an increase in importance at the expense of firm level features. This indicates that industries play a role in explaining stock returns during periods of heightened risk. △ Less

Submitted 22 January, 2021; v1 submitted 5 March, 2020; originally announced March 2020.

Comments: 35 pages, 9 figures

arXiv:1909.10979 [pdf, other]

doi 10.1007/JHEP12(2019)011

Domain walls and deconfinement: a semiclassical picture of discrete anomaly inflow

Authors: Andrew A. Cox, Erich Poppitz, Samuel S. Y. Wong

Abstract: We study the physics of quark deconfinement on domain walls in four-dimensional supersymmetric SU(N) Yang-Mills theory, compactified on a small circle with supersymmetric boundary conditions. We numerically examine the properties of BPS domain walls connecting vacua k units apart. We also determine their electric fluxes and use the results to show that Wilson loops of any nonzero N-ality exhibit p… ▽ More We study the physics of quark deconfinement on domain walls in four-dimensional supersymmetric SU(N) Yang-Mills theory, compactified on a small circle with supersymmetric boundary conditions. We numerically examine the properties of BPS domain walls connecting vacua k units apart. We also determine their electric fluxes and use the results to show that Wilson loops of any nonzero N-ality exhibit perimeter law on all k-walls. Our results confirm and extend, to all N and k, the validity of the semiclassical picture of deconfinement of Anber, Sulejmanpasic and one of us (EP), arXiv:1501.06773, providing a microscopic explanation of mixed 0-form/1-form anomaly inflow. △ Less

Submitted 3 December, 2019; v1 submitted 24 September, 2019; originally announced September 2019.

Comments: 49 pages, 18 figures, references added, final version, text and figures identical to published

Journal ref: JHEP12(2019)011

Showing 1–11 of 11 results for author: Wong, S Y