Search | arXiv e-print repository

How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench

Authors: Qinyuan Ye, Harvey Yiyun Fu, Xiang Ren, Robin Jia

Abstract: We investigate the predictability of large language model (LLM) capabilities: given records of past experiments using different model families, numbers of parameters, tasks, and numbers of in-context examples, can we accurately predict LLM performance on new experiment configurations? Answering this question has practical implications for LLM users (e.g., deciding which models to try), developers… ▽ More We investigate the predictability of large language model (LLM) capabilities: given records of past experiments using different model families, numbers of parameters, tasks, and numbers of in-context examples, can we accurately predict LLM performance on new experiment configurations? Answering this question has practical implications for LLM users (e.g., deciding which models to try), developers (e.g., prioritizing evaluation on representative tasks), and the research community (e.g., identifying hard-to-predict capabilities that warrant further investigation). We study the performance prediction problem on experiment records from BIG-bench. On a random train-test split, an MLP-based predictor achieves an $R^2$ score greater than 95%, indicating the presence of learnable patterns within the experiment records. We then formulate the problem of searching for "small-bench," an informative subset of BIG-bench tasks from which the performance on the full set can be maximally recovered. We find a subset as informative as BIG-bench Hard for evaluating new model families, while being $3\times$ smaller. Additionally, we find competitive subsets by clustering task representations learned by our MLP-based predictor and selecting tasks close to cluster centroids, highlighting the importance of task diversity in constructing "small-bench." △ Less

Submitted 31 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted to EMNLP 2023 Findings. Camera-ready version. Code: https://github.com/INK-USC/predicting-big-bench

arXiv:2305.14802 [pdf, other]

Estimating Large Language Model Capabilities without Labeled Test Data

Authors: Harvey Yiyun Fu, Qinyuan Ye, Albert Xu, Xiang Ren, Robin Jia

Abstract: Large Language Models (LLMs) have the impressive ability to perform in-context learning (ICL) from only a few examples, but the success of ICL varies widely from task to task. Thus, it is important to quickly determine whether ICL is applicable to a new task, but directly evaluating ICL accuracy can be expensive in situations where test data is expensive to annotate -- the exact situations where I… ▽ More Large Language Models (LLMs) have the impressive ability to perform in-context learning (ICL) from only a few examples, but the success of ICL varies widely from task to task. Thus, it is important to quickly determine whether ICL is applicable to a new task, but directly evaluating ICL accuracy can be expensive in situations where test data is expensive to annotate -- the exact situations where ICL is most appealing. In this paper, we propose the task of ICL accuracy estimation, in which we predict the accuracy of an LLM when doing in-context learning on a new task given only unlabeled test data for that task. To perform ICL accuracy estimation, we propose a method that trains a meta-model using LLM confidence scores as features. We compare our method to several strong accuracy estimation baselines on a new benchmark that covers 4 LLMs and 3 task collections. The meta-model improves over all baselines across 8 out of 12 settings and achieves the same estimation performance as directly evaluating on 40 collected labeled test examples per task. At the same time, no existing approach provides an accurate and reliable ICL accuracy estimation in every setting, highlighting the need for better ways to measure the uncertainty of LLM predictions. △ Less

Submitted 26 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted to EMNLP 2023 Findings. Camera-ready version. Code: https://github.com/harvey-fin/icl-estimate

arXiv:2302.13006 [pdf]

doi 10.1109/JLT.2023.3250224

All-polarization-maintaining linear cavity fiber lasers mode-locked by nonlinear polarization evolution in stretched pulse regime

Authors: Xuanyi Liu, Feng Ye, Minghe Zhao, Boris A. Malomed, H. Y. Fu, Qian Li

Abstract: Nonlinear polarization evolution (NPE) is among the most advanced techniques for obtaining ultrashort pulses with excellent optical performance. However, it is challenging to design environmentally stable NPE fiber oscillators using only polarization-maintaining (PM) fibers. Here, we use the same PM fiber and non-reciprocal phase shifter to design two different devices, which are capable of acting… ▽ More Nonlinear polarization evolution (NPE) is among the most advanced techniques for obtaining ultrashort pulses with excellent optical performance. However, it is challenging to design environmentally stable NPE fiber oscillators using only polarization-maintaining (PM) fibers. Here, we use the same PM fiber and non-reciprocal phase shifter to design two different devices, which are capable of acting as effective NPE saturable absorbers (SAs) in two all-PM linear cavity fiber lasers. These two laser setups differ in the position of the non-reciprocal phase shifter, the presence of which is crucial for the proposed fiber lasers to reduce their mode-locking thresholds and achieve high repetition rates above 100 MHz. The mode-locking principle and pulse evolution in the laser cavity are investigated theoretically. The first all-PM fiber oscillator emits sub-200 fs stretched pulses with low peak powers. The second oscillator, with a simpler architecture, directly delivers stretched pulses with high peak powers, the spectral bandwidth greater than 30 nm, and the pulse duration less than 90 fs. To the best of our knowledge, 79 fs achieved in this design is the shortest pulse duration provided by PM fiber lasers using NPE mode-lockers. △ Less

Submitted 25 February, 2023; originally announced February 2023.

Comments: to be published in J. Lightwave Tech

arXiv:2211.10600 [pdf]

doi 10.1016/j.optlastec.2022.108941

Robust mode-locking in a hybrid ultrafast laser based on nonlinear multimodal interference

Authors: Xuanyi Liu, Maolin Dai, Denghui Pan, Kaibin Lin, Boris A. Malomed, Qian Li, H. Y. Fu

Abstract: We experimentally demonstrate the realization of a half-polarization-maintaining (half-PM) fiber laser, in which mode-locking is provided by a reflective multimode-interference saturable absorber (SA). In the specially designed SA, linearly polarized light is coupled into a 15-cm-long graded-index multimode fiber (GIMF) through the PM fiber, and then reflected back to the PM structure through a mi… ▽ More We experimentally demonstrate the realization of a half-polarization-maintaining (half-PM) fiber laser, in which mode-locking is provided by a reflective multimode-interference saturable absorber (SA). In the specially designed SA, linearly polarized light is coupled into a 15-cm-long graded-index multimode fiber (GIMF) through the PM fiber, and then reflected back to the PM structure through a mirror pigtailed with a single-mode fiber (SMF). The modulation depth and saturation peak power are measured to be 1.5% and 0.6 W, respectively. The proposed SA device is incorporated into a novel half-PM erbium-doped fiber oscillator, which generates soliton pulses with 409 fs temporal duration at a 33.3 MHz repetition rate. The proposed fiber laser is compared with a conventional non-PM fiber laser mode-locked by nonlinear polarization evolution (NPE) in terms of optical properties such as spectral bandwidth, pulse duration, and stability performance. Short- and long-time stability tests and superior noise performance corroborate robust mode-locking in this setup. △ Less

Submitted 19 November, 2022; originally announced November 2022.

Comments: to be published in Optics and Laser Technology

arXiv:2208.01459 [pdf, other]

doi 10.1002/lpor.202200487

Observation of SQUID-like behavior in fiber laser with intra-cavity epsilon-near-zero effect

Authors: Jiaye Wu, Xuanyi Liu, Boris A. Malomed, Kuan-Chang Chang, Minghe Zhao, Kang Qi, Yanhua Sha, Ze Tao Xie, Marco Clementi, Camille-Sophie Brès, Shengdong Zhang, H. Y. Fu, Qian Li

Abstract: Establishing relations between fundamental effects in far-flung areas of physics is a subject of great interest in the current research. We here report realization of a novel photonic system akin to the radio-frequency superconducting quantum interference device (RF-SQUID), in a fiber laser cavity with epsilon-near-zero (ENZ) nanolayers as intra-cavity components. Emulating the RF-SQUID scheme, th… ▽ More Establishing relations between fundamental effects in far-flung areas of physics is a subject of great interest in the current research. We here report realization of a novel photonic system akin to the radio-frequency superconducting quantum interference device (RF-SQUID), in a fiber laser cavity with epsilon-near-zero (ENZ) nanolayers as intra-cavity components. Emulating the RF-SQUID scheme, the photonic counterpart of the supercurrent, represented by the optical wave, circulates in the cavity, passing through effective optical potential barriers. Different ENZ wavelengths translate into distinct spectral outputs through the variation of cavity resonances, emulating the situation with a frequency-varying tank circuit in the RF-SQUID. Due to the presence of the ENZ element, the optical potential barrier is far lower for selected frequency components, granting them advantage in the gain-resource competition. The findings reported in this work provide a deeper insight into the ultrafast ENZ photonics, revealing a new path towards the design of nanophotonic on-chip devices with various operational functions, and offer a new approach to study superconducting and quantum-mechanical systems. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Comments: to be published in Laser & Photonics Reviews

arXiv:2206.08122 [pdf]

Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures

Authors: Vladimir Belov, Tracy Erwin-Grabner, Ali Saffet Gonul, Alyssa R. Amod, Amar Ojha, Andre Aleman, Annemiek Dols, Anouk Scharntee, Aslihan Uyar-Demir, Ben J Harrison, Benson M. Irungu, Bianca Besteher, Bonnie Klimes-Dougan, Brenda W. J. H. Penninx, Bryon A. Mueller, Carlos Zarate, Christopher G. Davey, Christopher R. K. Ching, Colm G. Connolly, Cynthia H. Y. Fu, Dan J. Stein, Danai Dima, David E. J. Linden, David M. A. Mehler, Edith Pomarol-Clotet , et al. (41 additional authors not shown)

Abstract: Machine learning (ML) techniques have gained popularity in the neuroimaging field due to their potential for classifying neuropsychiatric disorders. However, the diagnostic predictive power of the existing algorithms has been limited by small sample sizes, lack of representativeness, data leakage, and/or overfitting. Here, we overcome these limitations with the largest multi-site sample size to da… ▽ More Machine learning (ML) techniques have gained popularity in the neuroimaging field due to their potential for classifying neuropsychiatric disorders. However, the diagnostic predictive power of the existing algorithms has been limited by small sample sizes, lack of representativeness, data leakage, and/or overfitting. Here, we overcome these limitations with the largest multi-site sample size to date (n=5,356) to provide a generalizable ML classification benchmark of major depressive disorder (MDD). Using brain measures from standardized ENIGMA analysis pipelines in FreeSurfer, we were able to classify MDD vs healthy controls (HC) with around 62% balanced accuracy, but when harmonizing the data using ComBat balanced accuracy dropped to approximately 52%. Similar results were observed in stratified groups according to age of onset, antidepressant use, number of episodes and sex. Future studies incorporating higher dimensional brain imaging/phenotype features, and/or using more advanced machine and deep learning methods may achieve more encouraging prospects. △ Less

Submitted 25 October, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

Comments: main document 37 pages; supplementary material 24 pages

arXiv:2202.11144 [pdf, other]

Robust Hierarchical Patterns for identifying MDD patients: A Multisite Study

Authors: Dushyant Sahoo, Mathilde Antoniades, Cynthia H. Y. Fu, Christos Davatzikos

Abstract: Many supervised machine learning frameworks have been proposed for disease classification using functional magnetic resonance imaging (fMRI) data, producing important biomarkers. More recently, data pooling has flourished, making the result generalizable across a large population. But, this success depends on the population diversity and variability introduced due to the pooling of the data that i… ▽ More Many supervised machine learning frameworks have been proposed for disease classification using functional magnetic resonance imaging (fMRI) data, producing important biomarkers. More recently, data pooling has flourished, making the result generalizable across a large population. But, this success depends on the population diversity and variability introduced due to the pooling of the data that is not a primary research interest. Here, we look at hierarchical Sparse Connectivity Patterns (hSCPs) as biomarkers for major depressive disorder (MDD). We propose a novel model based on hSCPs to predict MDD patients from functional connectivity matrices extracted from resting-state fMRI data. Our model consists of three coupled terms. The first term decomposes connectivity matrices into hierarchical low-rank sparse components corresponding to synchronous patterns across the human brain. These components are then combined via patient-specific weights capturing heterogeneity in the data. The second term is a classification loss that uses the patient-specific weights to classify MDD patients from healthy ones. Both of these terms are combined with the third term, a robustness loss function to improve the reproducibility of hSCPs. This reduces the variability introduced due to site and population diversity (age and sex) on the predictive accuracy and pattern stability in a large dataset pooled from five different sites. Our results show the impact of diversity on prediction performance. Our model can reduce diversity and improve the predictive and generalizing capability of the components. Finally, our results show that our proposed model can robustly identify clinically relevant patterns characteristic of MDD with high reproducibility. △ Less

Submitted 22 February, 2022; originally announced February 2022.

arXiv:2110.11347 [pdf]

Multidimensional representations in late-life depression: convergence in neuroimaging, cognition, clinical symptomatology and genetics

Authors: Junhao Wen, Cynthia H. Y. Fu, Duygu Tosun, Yogasudha Veturi, Zhijian Yang, Ahmed Abdulkadir, Elizabeth Mamourian, Dhivya Srinivasan, **gxuan Bao, Guray Erus, Haochang Shou, Mohamad Habes, Jimit Doshi, Erdem Varol, Scott R Mackin, Aristeidis Sotiras, Yong Fan, Andrew J. Saykin, Yvette I. Sheline, Li Shen, Marylyn D. Ritchie, David A. Wolk, Marilyn Albert, Susan M. Resnick, Christos Davatzikos

Abstract: Late-life depression (LLD) is characterized by considerable heterogeneity in clinical manifestation. Unraveling such heterogeneity would aid in elucidating etiological mechanisms and pave the road to precision and individualized medicine. We sought to delineate, cross-sectionally and longitudinally, disease-related heterogeneity in LLD linked to neuroanatomy, cognitive functioning, clinical sympto… ▽ More Late-life depression (LLD) is characterized by considerable heterogeneity in clinical manifestation. Unraveling such heterogeneity would aid in elucidating etiological mechanisms and pave the road to precision and individualized medicine. We sought to delineate, cross-sectionally and longitudinally, disease-related heterogeneity in LLD linked to neuroanatomy, cognitive functioning, clinical symptomatology, and genetic profiles. Multimodal data from a multicentre sample (N=996) were analyzed. A semi-supervised clustering method (HYDRA) was applied to regional grey matter (GM) brain volumes to derive dimensional representations. Two dimensions were identified, which accounted for the LLD-related heterogeneity in voxel-wise GM maps, white matter (WM) fractional anisotropy (FA), neurocognitive functioning, clinical phenotype, and genetics. Dimension one (Dim1) demonstrated relatively preserved brain anatomy without WM disruptions relative to healthy controls. In contrast, dimension two (Dim2) showed widespread brain atrophy and WM integrity disruptions, along with cognitive impairment and higher depression severity. Moreover, one de novo independent genetic variant (rs13120336) was significantly associated with Dim 1 but not with Dim 2. Notably, the two dimensions demonstrated significant SNP-based heritability of 18-27% within the general population (N=12,518 in UKBB). Lastly, in a subset of individuals having longitudinal measurements, Dim2 demonstrated a more rapid longitudinal decrease in GM and brain age, and was more likely to progress to Alzheimers disease, compared to Dim1 (N=1,413 participants and 7,225 scans from ADNI, BLSA, and BIOCARD datasets). △ Less

Submitted 25 October, 2021; v1 submitted 20 October, 2021; originally announced October 2021.

arXiv:2109.14202 [pdf]

A Robust and Novel Linear Fiber Laser Mode-locked by Nonlinear Polarization Evolution in All-polarization-maintaining Fibers

Authors: Xuanyi Liu, Qian Li, Denghui Pan, Feng Ye, Boris A. Malomed, H. Y. Fu

Abstract: We demonstrate a novel, robust and compact fiber laser mode-locked by nonlinear polarization evolution (NPE) in polarization-maintaining (PM) fibers. The reflectivity of the artificial saturable absorber (SA) is analyzed to explain the mode-locking mechanism in the laser cavity. Experimentally, three linear laser schemes that feature repetition rates 94 MHz, 124 MHz and 133 MHz are systematically… ▽ More We demonstrate a novel, robust and compact fiber laser mode-locked by nonlinear polarization evolution (NPE) in polarization-maintaining (PM) fibers. The reflectivity of the artificial saturable absorber (SA) is analyzed to explain the mode-locking mechanism in the laser cavity. Experimentally, three linear laser schemes that feature repetition rates 94 MHz, 124 MHz and 133 MHz are systematically investigated. When the pump power is 1100 mW, the 124-MHz laser cavity delivers highly stable pulses with a single-pulse energy of 0.92 nJ. After the compression, the pulse duration obtained from the 124-MHz fiber laser is 250 fs, while the corresponding transform-limited pulse duration is 124 fs. The highest fundamental repetition rate that could be achieved in our experiment is 133 MHz, as mentioned above. The noise characterization has been performed with different cavity lengths and therefore different net-cavity dispersion. The 68-fs timing jitter and the 0.01% relative intensity noise (RIN) of the 133-MHz fiber laser have been realized integrated from 1 kHz to 10 MHz. Furthermore, the root-mean-square (RMS) power fluctuation is 0.35% in 2 hours, which implies superior stability of the output power. Thus, this linear fiber oscillator provides a competitive low-noise light source for optical applications appropriate for complex environments. △ Less

Submitted 29 September, 2021; originally announced September 2021.

Comments: To be published in Journal of Lightwave Technology

arXiv:2106.07872 [pdf, other]

Planar multi-aperture fish-eye lens using metagrating

Authors: Zihan Zang, Haoqiang Wang, Yanjun Han, Hongtao Li, H. Y. FU, Yi Luo

Abstract: The design of compact optical systems with large field of view has been difficult due to the requirement of many elements or a curved focal plane to reduce off-axis aberration. We propose a multi-aperture lens design to effectively resolve these issues. Metagrating-based deflectors are placed near entrance pupils of multi-aperture lens array to enhance field of view. A systematic design method is… ▽ More The design of compact optical systems with large field of view has been difficult due to the requirement of many elements or a curved focal plane to reduce off-axis aberration. We propose a multi-aperture lens design to effectively resolve these issues. Metagrating-based deflectors are placed near entrance pupils of multi-aperture lens array to enhance field of view. A systematic design method is given in details. In design examples, a $\pm$80$^\circ$ field of view using only two planar optical elements is achieved. Also, the system is extremely compact with total track lengths an order of magnitude smaller than conventional fish-eye lenses, while the imaging performance is comparable with conventional designs. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2103.05360 [pdf]

Ultrafast Parallel LiDAR with Time-encoding and Spectral Scanning: Breaking the Time-of-flight Limit

Authors: Zihan Zang, Zhi Li, Yi Luo, Yanjun Han, Xuanyi Liu, H. Y. Fu

Abstract: Light detection and ranging (LiDAR) has been widely used in autonomous driving and large-scale manufacturing. Although state-of-the-art scanning LiDAR can perform long-range three-dimensional imaging, the frame rate is limited by both round-trip delay and the beam steering speed, hindering the development of high-speed autonomous vehicles. For hundred-meter level ranging applications, a several-ti… ▽ More Light detection and ranging (LiDAR) has been widely used in autonomous driving and large-scale manufacturing. Although state-of-the-art scanning LiDAR can perform long-range three-dimensional imaging, the frame rate is limited by both round-trip delay and the beam steering speed, hindering the development of high-speed autonomous vehicles. For hundred-meter level ranging applications, a several-time speedup is highly desirable. Here, we uniquely combine fiber-based encoders with wavelength-division multiplexing devices to implement all-optical time-encoding on the illumination light. Using this method, parallel detection and fast inertia-free spectral scanning can be achieved simultaneously with single-pixel detection. As a result, the frame rate of a scanning LiDAR can be multiplied with scalability. We demonstrate a 4.4-fold speedup for a maximum 75-m detection range, compared with a time-of-flight-limited laser ranging system. This approach has the potential to improve the velocity of LiDAR-based autonomous vehicles to the regime of hundred kilometers per hour and open up a new paradigm for ultrafast-frame-rate LiDAR imaging. △ Less

Submitted 9 March, 2021; originally announced March 2021.

arXiv:1911.11917 [pdf, other]

doi 10.1364/OE.27.037298

Self-interaction of ultrashort pulses in an epsilon-near-zero nonlinear material at the telecom wavelength

Authors: Jiaye Wu, Boris A. Malomed, H. Y. Fu, Qian Li

Abstract: Dynamics of femtosecond pulses with the telecom carrier wavelength is investigated numerically in a subwavelength layer of an indium tin oxide (ITO) epsilon-near-zero (ENZ) material with high dispersion and high nonlinearity. Due to the subwavelength thickness of the ITO ENZ material, and the fact that the pulse's propagation time is shorter than its temporal width, multiple reflections give rise… ▽ More Dynamics of femtosecond pulses with the telecom carrier wavelength is investigated numerically in a subwavelength layer of an indium tin oxide (ITO) epsilon-near-zero (ENZ) material with high dispersion and high nonlinearity. Due to the subwavelength thickness of the ITO ENZ material, and the fact that the pulse's propagation time is shorter than its temporal width, multiple reflections give rise to self-interaction in both spectral and temporal domains, especially at wavelengths longer than the ENZ point, at which the reflections are significantly stronger. A larger absolute value of the pulse's chirp strongly affects the self-interaction by redistributing energy between wavelengths, while the sign of the chirp affects the interaction in the temporal domain. It is also found that, when two identical pulses are launched simultaneously from both ends, a subwavelength counterpart of a standing-wave state can be established. It shows robust energy localization in the middle of the sample, in terms of both the spectral and temporal intensity distributions. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Comments: Accepted, to be published on Optics Express. \c{opyright} 2019 Optical Society of America]. Users may use, reuse, and build upon the article, or use the article for text or data mining, so long as such uses are for non-commercial purposes and appropriate attribution is maintained. All other rights are reserved

Journal ref: Optics Express 27(26): 37298-37307, 2019

arXiv:1705.10312 [pdf]

Classification of Major Depressive Disorder via Multi-Site Weighted LASSO Model

Authors: Dajiang Zhu, Brandalyn C. Riedel, Neda Jahanshad, Nynke A. Groenewold, Dan J. Stein, Ian H. Gotlib, Matthew D. Sacchet, Danai Dima, James H. Cole, Cynthia H. Y. Fu, Henrik Walter, Ilya M. Veer, Thomas Frodl, Lianne Schmaal, Dick J. Veltman, Paul M. Thompson

Abstract: Large-scale collaborative analysis of brain imaging data, in psychiatry and neu-rology, offers a new source of statistical power to discover features that boost ac-curacy in disease classification, differential diagnosis, and outcome prediction. However, due to data privacy regulations or limited accessibility to large datasets across the world, it is challenging to efficiently integrate distribut… ▽ More Large-scale collaborative analysis of brain imaging data, in psychiatry and neu-rology, offers a new source of statistical power to discover features that boost ac-curacy in disease classification, differential diagnosis, and outcome prediction. However, due to data privacy regulations or limited accessibility to large datasets across the world, it is challenging to efficiently integrate distributed information. Here we propose a novel classification framework through multi-site weighted LASSO: each site performs an iterative weighted LASSO for feature selection separately. Within each iteration, the classification result and the selected features are collected to update the weighting parameters for each feature. This new weight is used to guide the LASSO process at the next iteration. Only the fea-tures that help to improve the classification accuracy are preserved. In tests on da-ta from five sites (299 patients with major depressive disorder (MDD) and 258 normal controls), our method boosted classification accuracy for MDD by 4.9% on average. This result shows the potential of the proposed new strategy as an ef-fective and practical collaborative platform for machine learning on large scale distributed imaging and biobank data. △ Less

Submitted 3 June, 2017; v1 submitted 26 May, 2017; originally announced May 2017.

Comments: Accepted by MICCAI 2017

Showing 1–13 of 13 results for author: Fu, H Y