-
How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench
Authors:
Qinyuan Ye,
Harvey Yiyun Fu,
Xiang Ren,
Robin Jia
Abstract:
We investigate the predictability of large language model (LLM) capabilities: given records of past experiments using different model families, numbers of parameters, tasks, and numbers of in-context examples, can we accurately predict LLM performance on new experiment configurations? Answering this question has practical implications for LLM users (e.g., deciding which models to try), developers…
▽ More
We investigate the predictability of large language model (LLM) capabilities: given records of past experiments using different model families, numbers of parameters, tasks, and numbers of in-context examples, can we accurately predict LLM performance on new experiment configurations? Answering this question has practical implications for LLM users (e.g., deciding which models to try), developers (e.g., prioritizing evaluation on representative tasks), and the research community (e.g., identifying hard-to-predict capabilities that warrant further investigation).
We study the performance prediction problem on experiment records from BIG-bench. On a random train-test split, an MLP-based predictor achieves an $R^2$ score greater than 95%, indicating the presence of learnable patterns within the experiment records. We then formulate the problem of searching for "small-bench," an informative subset of BIG-bench tasks from which the performance on the full set can be maximally recovered. We find a subset as informative as BIG-bench Hard for evaluating new model families, while being $3\times$ smaller. Additionally, we find competitive subsets by clustering task representations learned by our MLP-based predictor and selecting tasks close to cluster centroids, highlighting the importance of task diversity in constructing "small-bench."
△ Less
Submitted 31 October, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Estimating Large Language Model Capabilities without Labeled Test Data
Authors:
Harvey Yiyun Fu,
Qinyuan Ye,
Albert Xu,
Xiang Ren,
Robin Jia
Abstract:
Large Language Models (LLMs) have the impressive ability to perform in-context learning (ICL) from only a few examples, but the success of ICL varies widely from task to task. Thus, it is important to quickly determine whether ICL is applicable to a new task, but directly evaluating ICL accuracy can be expensive in situations where test data is expensive to annotate -- the exact situations where I…
▽ More
Large Language Models (LLMs) have the impressive ability to perform in-context learning (ICL) from only a few examples, but the success of ICL varies widely from task to task. Thus, it is important to quickly determine whether ICL is applicable to a new task, but directly evaluating ICL accuracy can be expensive in situations where test data is expensive to annotate -- the exact situations where ICL is most appealing. In this paper, we propose the task of ICL accuracy estimation, in which we predict the accuracy of an LLM when doing in-context learning on a new task given only unlabeled test data for that task. To perform ICL accuracy estimation, we propose a method that trains a meta-model using LLM confidence scores as features. We compare our method to several strong accuracy estimation baselines on a new benchmark that covers 4 LLMs and 3 task collections. The meta-model improves over all baselines across 8 out of 12 settings and achieves the same estimation performance as directly evaluating on 40 collected labeled test examples per task. At the same time, no existing approach provides an accurate and reliable ICL accuracy estimation in every setting, highlighting the need for better ways to measure the uncertainty of LLM predictions.
△ Less
Submitted 26 October, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
All-polarization-maintaining linear cavity fiber lasers mode-locked by nonlinear polarization evolution in stretched pulse regime
Authors:
Xuanyi Liu,
Feng Ye,
Minghe Zhao,
Boris A. Malomed,
H. Y. Fu,
Qian Li
Abstract:
Nonlinear polarization evolution (NPE) is among the most advanced techniques for obtaining ultrashort pulses with excellent optical performance. However, it is challenging to design environmentally stable NPE fiber oscillators using only polarization-maintaining (PM) fibers. Here, we use the same PM fiber and non-reciprocal phase shifter to design two different devices, which are capable of acting…
▽ More
Nonlinear polarization evolution (NPE) is among the most advanced techniques for obtaining ultrashort pulses with excellent optical performance. However, it is challenging to design environmentally stable NPE fiber oscillators using only polarization-maintaining (PM) fibers. Here, we use the same PM fiber and non-reciprocal phase shifter to design two different devices, which are capable of acting as effective NPE saturable absorbers (SAs) in two all-PM linear cavity fiber lasers. These two laser setups differ in the position of the non-reciprocal phase shifter, the presence of which is crucial for the proposed fiber lasers to reduce their mode-locking thresholds and achieve high repetition rates above 100 MHz. The mode-locking principle and pulse evolution in the laser cavity are investigated theoretically. The first all-PM fiber oscillator emits sub-200 fs stretched pulses with low peak powers. The second oscillator, with a simpler architecture, directly delivers stretched pulses with high peak powers, the spectral bandwidth greater than 30 nm, and the pulse duration less than 90 fs. To the best of our knowledge, 79 fs achieved in this design is the shortest pulse duration provided by PM fiber lasers using NPE mode-lockers.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
Robust mode-locking in a hybrid ultrafast laser based on nonlinear multimodal interference
Authors:
Xuanyi Liu,
Maolin Dai,
Denghui Pan,
Kaibin Lin,
Boris A. Malomed,
Qian Li,
H. Y. Fu
Abstract:
We experimentally demonstrate the realization of a half-polarization-maintaining (half-PM) fiber laser, in which mode-locking is provided by a reflective multimode-interference saturable absorber (SA). In the specially designed SA, linearly polarized light is coupled into a 15-cm-long graded-index multimode fiber (GIMF) through the PM fiber, and then reflected back to the PM structure through a mi…
▽ More
We experimentally demonstrate the realization of a half-polarization-maintaining (half-PM) fiber laser, in which mode-locking is provided by a reflective multimode-interference saturable absorber (SA). In the specially designed SA, linearly polarized light is coupled into a 15-cm-long graded-index multimode fiber (GIMF) through the PM fiber, and then reflected back to the PM structure through a mirror pigtailed with a single-mode fiber (SMF). The modulation depth and saturation peak power are measured to be 1.5% and 0.6 W, respectively. The proposed SA device is incorporated into a novel half-PM erbium-doped fiber oscillator, which generates soliton pulses with 409 fs temporal duration at a 33.3 MHz repetition rate. The proposed fiber laser is compared with a conventional non-PM fiber laser mode-locked by nonlinear polarization evolution (NPE) in terms of optical properties such as spectral bandwidth, pulse duration, and stability performance. Short- and long-time stability tests and superior noise performance corroborate robust mode-locking in this setup.
△ Less
Submitted 19 November, 2022;
originally announced November 2022.
-
Observation of SQUID-like behavior in fiber laser with intra-cavity epsilon-near-zero effect
Authors:
Jiaye Wu,
Xuanyi Liu,
Boris A. Malomed,
Kuan-Chang Chang,
Minghe Zhao,
Kang Qi,
Yanhua Sha,
Ze Tao Xie,
Marco Clementi,
Camille-Sophie Brès,
Shengdong Zhang,
H. Y. Fu,
Qian Li
Abstract:
Establishing relations between fundamental effects in far-flung areas of physics is a subject of great interest in the current research. We here report realization of a novel photonic system akin to the radio-frequency superconducting quantum interference device (RF-SQUID), in a fiber laser cavity with epsilon-near-zero (ENZ) nanolayers as intra-cavity components. Emulating the RF-SQUID scheme, th…
▽ More
Establishing relations between fundamental effects in far-flung areas of physics is a subject of great interest in the current research. We here report realization of a novel photonic system akin to the radio-frequency superconducting quantum interference device (RF-SQUID), in a fiber laser cavity with epsilon-near-zero (ENZ) nanolayers as intra-cavity components. Emulating the RF-SQUID scheme, the photonic counterpart of the supercurrent, represented by the optical wave, circulates in the cavity, passing through effective optical potential barriers. Different ENZ wavelengths translate into distinct spectral outputs through the variation of cavity resonances, emulating the situation with a frequency-varying tank circuit in the RF-SQUID. Due to the presence of the ENZ element, the optical potential barrier is far lower for selected frequency components, granting them advantage in the gain-resource competition. The findings reported in this work provide a deeper insight into the ultrafast ENZ photonics, revealing a new path towards the design of nanophotonic on-chip devices with various operational functions, and offer a new approach to study superconducting and quantum-mechanical systems.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures
Authors:
Vladimir Belov,
Tracy Erwin-Grabner,
Ali Saffet Gonul,
Alyssa R. Amod,
Amar Ojha,
Andre Aleman,
Annemiek Dols,
Anouk Scharntee,
Aslihan Uyar-Demir,
Ben J Harrison,
Benson M. Irungu,
Bianca Besteher,
Bonnie Klimes-Dougan,
Brenda W. J. H. Penninx,
Bryon A. Mueller,
Carlos Zarate,
Christopher G. Davey,
Christopher R. K. Ching,
Colm G. Connolly,
Cynthia H. Y. Fu,
Dan J. Stein,
Danai Dima,
David E. J. Linden,
David M. A. Mehler,
Edith Pomarol-Clotet
, et al. (41 additional authors not shown)
Abstract:
Machine learning (ML) techniques have gained popularity in the neuroimaging field due to their potential for classifying neuropsychiatric disorders. However, the diagnostic predictive power of the existing algorithms has been limited by small sample sizes, lack of representativeness, data leakage, and/or overfitting. Here, we overcome these limitations with the largest multi-site sample size to da…
▽ More
Machine learning (ML) techniques have gained popularity in the neuroimaging field due to their potential for classifying neuropsychiatric disorders. However, the diagnostic predictive power of the existing algorithms has been limited by small sample sizes, lack of representativeness, data leakage, and/or overfitting. Here, we overcome these limitations with the largest multi-site sample size to date (n=5,356) to provide a generalizable ML classification benchmark of major depressive disorder (MDD). Using brain measures from standardized ENIGMA analysis pipelines in FreeSurfer, we were able to classify MDD vs healthy controls (HC) with around 62% balanced accuracy, but when harmonizing the data using ComBat balanced accuracy dropped to approximately 52%. Similar results were observed in stratified groups according to age of onset, antidepressant use, number of episodes and sex. Future studies incorporating higher dimensional brain imaging/phenotype features, and/or using more advanced machine and deep learning methods may achieve more encouraging prospects.
△ Less
Submitted 25 October, 2022; v1 submitted 16 June, 2022;
originally announced June 2022.
-
Robust Hierarchical Patterns for identifying MDD patients: A Multisite Study
Authors:
Dushyant Sahoo,
Mathilde Antoniades,
Cynthia H. Y. Fu,
Christos Davatzikos
Abstract:
Many supervised machine learning frameworks have been proposed for disease classification using functional magnetic resonance imaging (fMRI) data, producing important biomarkers. More recently, data pooling has flourished, making the result generalizable across a large population. But, this success depends on the population diversity and variability introduced due to the pooling of the data that i…
▽ More
Many supervised machine learning frameworks have been proposed for disease classification using functional magnetic resonance imaging (fMRI) data, producing important biomarkers. More recently, data pooling has flourished, making the result generalizable across a large population. But, this success depends on the population diversity and variability introduced due to the pooling of the data that is not a primary research interest. Here, we look at hierarchical Sparse Connectivity Patterns (hSCPs) as biomarkers for major depressive disorder (MDD). We propose a novel model based on hSCPs to predict MDD patients from functional connectivity matrices extracted from resting-state fMRI data. Our model consists of three coupled terms. The first term decomposes connectivity matrices into hierarchical low-rank sparse components corresponding to synchronous patterns across the human brain. These components are then combined via patient-specific weights capturing heterogeneity in the data. The second term is a classification loss that uses the patient-specific weights to classify MDD patients from healthy ones. Both of these terms are combined with the third term, a robustness loss function to improve the reproducibility of hSCPs. This reduces the variability introduced due to site and population diversity (age and sex) on the predictive accuracy and pattern stability in a large dataset pooled from five different sites. Our results show the impact of diversity on prediction performance. Our model can reduce diversity and improve the predictive and generalizing capability of the components. Finally, our results show that our proposed model can robustly identify clinically relevant patterns characteristic of MDD with high reproducibility.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.
-
Multidimensional representations in late-life depression: convergence in neuroimaging, cognition, clinical symptomatology and genetics
Authors:
Junhao Wen,
Cynthia H. Y. Fu,
Duygu Tosun,
Yogasudha Veturi,
Zhijian Yang,
Ahmed Abdulkadir,
Elizabeth Mamourian,
Dhivya Srinivasan,
**gxuan Bao,
Guray Erus,
Haochang Shou,
Mohamad Habes,
Jimit Doshi,
Erdem Varol,
Scott R Mackin,
Aristeidis Sotiras,
Yong Fan,
Andrew J. Saykin,
Yvette I. Sheline,
Li Shen,
Marylyn D. Ritchie,
David A. Wolk,
Marilyn Albert,
Susan M. Resnick,
Christos Davatzikos
Abstract:
Late-life depression (LLD) is characterized by considerable heterogeneity in clinical manifestation. Unraveling such heterogeneity would aid in elucidating etiological mechanisms and pave the road to precision and individualized medicine. We sought to delineate, cross-sectionally and longitudinally, disease-related heterogeneity in LLD linked to neuroanatomy, cognitive functioning, clinical sympto…
▽ More
Late-life depression (LLD) is characterized by considerable heterogeneity in clinical manifestation. Unraveling such heterogeneity would aid in elucidating etiological mechanisms and pave the road to precision and individualized medicine. We sought to delineate, cross-sectionally and longitudinally, disease-related heterogeneity in LLD linked to neuroanatomy, cognitive functioning, clinical symptomatology, and genetic profiles. Multimodal data from a multicentre sample (N=996) were analyzed. A semi-supervised clustering method (HYDRA) was applied to regional grey matter (GM) brain volumes to derive dimensional representations. Two dimensions were identified, which accounted for the LLD-related heterogeneity in voxel-wise GM maps, white matter (WM) fractional anisotropy (FA), neurocognitive functioning, clinical phenotype, and genetics. Dimension one (Dim1) demonstrated relatively preserved brain anatomy without WM disruptions relative to healthy controls. In contrast, dimension two (Dim2) showed widespread brain atrophy and WM integrity disruptions, along with cognitive impairment and higher depression severity. Moreover, one de novo independent genetic variant (rs13120336) was significantly associated with Dim 1 but not with Dim 2. Notably, the two dimensions demonstrated significant SNP-based heritability of 18-27% within the general population (N=12,518 in UKBB). Lastly, in a subset of individuals having longitudinal measurements, Dim2 demonstrated a more rapid longitudinal decrease in GM and brain age, and was more likely to progress to Alzheimers disease, compared to Dim1 (N=1,413 participants and 7,225 scans from ADNI, BLSA, and BIOCARD datasets).
△ Less
Submitted 25 October, 2021; v1 submitted 20 October, 2021;
originally announced October 2021.
-
A Robust and Novel Linear Fiber Laser Mode-locked by Nonlinear Polarization Evolution in All-polarization-maintaining Fibers
Authors:
Xuanyi Liu,
Qian Li,
Denghui Pan,
Feng Ye,
Boris A. Malomed,
H. Y. Fu
Abstract:
We demonstrate a novel, robust and compact fiber laser mode-locked by nonlinear polarization evolution (NPE) in polarization-maintaining (PM) fibers. The reflectivity of the artificial saturable absorber (SA) is analyzed to explain the mode-locking mechanism in the laser cavity. Experimentally, three linear laser schemes that feature repetition rates 94 MHz, 124 MHz and 133 MHz are systematically…
▽ More
We demonstrate a novel, robust and compact fiber laser mode-locked by nonlinear polarization evolution (NPE) in polarization-maintaining (PM) fibers. The reflectivity of the artificial saturable absorber (SA) is analyzed to explain the mode-locking mechanism in the laser cavity. Experimentally, three linear laser schemes that feature repetition rates 94 MHz, 124 MHz and 133 MHz are systematically investigated. When the pump power is 1100 mW, the 124-MHz laser cavity delivers highly stable pulses with a single-pulse energy of 0.92 nJ. After the compression, the pulse duration obtained from the 124-MHz fiber laser is 250 fs, while the corresponding transform-limited pulse duration is 124 fs. The highest fundamental repetition rate that could be achieved in our experiment is 133 MHz, as mentioned above. The noise characterization has been performed with different cavity lengths and therefore different net-cavity dispersion. The 68-fs timing jitter and the 0.01% relative intensity noise (RIN) of the 133-MHz fiber laser have been realized integrated from 1 kHz to 10 MHz. Furthermore, the root-mean-square (RMS) power fluctuation is 0.35% in 2 hours, which implies superior stability of the output power. Thus, this linear fiber oscillator provides a competitive low-noise light source for optical applications appropriate for complex environments.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Planar multi-aperture fish-eye lens using metagrating
Authors:
Zihan Zang,
Haoqiang Wang,
Yanjun Han,
Hongtao Li,
H. Y. FU,
Yi Luo
Abstract:
The design of compact optical systems with large field of view has been difficult due to the requirement of many elements or a curved focal plane to reduce off-axis aberration. We propose a multi-aperture lens design to effectively resolve these issues. Metagrating-based deflectors are placed near entrance pupils of multi-aperture lens array to enhance field of view. A systematic design method is…
▽ More
The design of compact optical systems with large field of view has been difficult due to the requirement of many elements or a curved focal plane to reduce off-axis aberration. We propose a multi-aperture lens design to effectively resolve these issues. Metagrating-based deflectors are placed near entrance pupils of multi-aperture lens array to enhance field of view. A systematic design method is given in details. In design examples, a $\pm$80$^\circ$ field of view using only two planar optical elements is achieved. Also, the system is extremely compact with total track lengths an order of magnitude smaller than conventional fish-eye lenses, while the imaging performance is comparable with conventional designs.
△ Less
Submitted 15 June, 2021;
originally announced June 2021.
-
Ultrafast Parallel LiDAR with Time-encoding and Spectral Scanning: Breaking the Time-of-flight Limit
Authors:
Zihan Zang,
Zhi Li,
Yi Luo,
Yanjun Han,
Xuanyi Liu,
H. Y. Fu
Abstract:
Light detection and ranging (LiDAR) has been widely used in autonomous driving and large-scale manufacturing. Although state-of-the-art scanning LiDAR can perform long-range three-dimensional imaging, the frame rate is limited by both round-trip delay and the beam steering speed, hindering the development of high-speed autonomous vehicles. For hundred-meter level ranging applications, a several-ti…
▽ More
Light detection and ranging (LiDAR) has been widely used in autonomous driving and large-scale manufacturing. Although state-of-the-art scanning LiDAR can perform long-range three-dimensional imaging, the frame rate is limited by both round-trip delay and the beam steering speed, hindering the development of high-speed autonomous vehicles. For hundred-meter level ranging applications, a several-time speedup is highly desirable. Here, we uniquely combine fiber-based encoders with wavelength-division multiplexing devices to implement all-optical time-encoding on the illumination light. Using this method, parallel detection and fast inertia-free spectral scanning can be achieved simultaneously with single-pixel detection. As a result, the frame rate of a scanning LiDAR can be multiplied with scalability. We demonstrate a 4.4-fold speedup for a maximum 75-m detection range, compared with a time-of-flight-limited laser ranging system. This approach has the potential to improve the velocity of LiDAR-based autonomous vehicles to the regime of hundred kilometers per hour and open up a new paradigm for ultrafast-frame-rate LiDAR imaging.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Self-interaction of ultrashort pulses in an epsilon-near-zero nonlinear material at the telecom wavelength
Authors:
Jiaye Wu,
Boris A. Malomed,
H. Y. Fu,
Qian Li
Abstract:
Dynamics of femtosecond pulses with the telecom carrier wavelength is investigated numerically in a subwavelength layer of an indium tin oxide (ITO) epsilon-near-zero (ENZ) material with high dispersion and high nonlinearity. Due to the subwavelength thickness of the ITO ENZ material, and the fact that the pulse's propagation time is shorter than its temporal width, multiple reflections give rise…
▽ More
Dynamics of femtosecond pulses with the telecom carrier wavelength is investigated numerically in a subwavelength layer of an indium tin oxide (ITO) epsilon-near-zero (ENZ) material with high dispersion and high nonlinearity. Due to the subwavelength thickness of the ITO ENZ material, and the fact that the pulse's propagation time is shorter than its temporal width, multiple reflections give rise to self-interaction in both spectral and temporal domains, especially at wavelengths longer than the ENZ point, at which the reflections are significantly stronger. A larger absolute value of the pulse's chirp strongly affects the self-interaction by redistributing energy between wavelengths, while the sign of the chirp affects the interaction in the temporal domain. It is also found that, when two identical pulses are launched simultaneously from both ends, a subwavelength counterpart of a standing-wave state can be established. It shows robust energy localization in the middle of the sample, in terms of both the spectral and temporal intensity distributions.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Classification of Major Depressive Disorder via Multi-Site Weighted LASSO Model
Authors:
Dajiang Zhu,
Brandalyn C. Riedel,
Neda Jahanshad,
Nynke A. Groenewold,
Dan J. Stein,
Ian H. Gotlib,
Matthew D. Sacchet,
Danai Dima,
James H. Cole,
Cynthia H. Y. Fu,
Henrik Walter,
Ilya M. Veer,
Thomas Frodl,
Lianne Schmaal,
Dick J. Veltman,
Paul M. Thompson
Abstract:
Large-scale collaborative analysis of brain imaging data, in psychiatry and neu-rology, offers a new source of statistical power to discover features that boost ac-curacy in disease classification, differential diagnosis, and outcome prediction. However, due to data privacy regulations or limited accessibility to large datasets across the world, it is challenging to efficiently integrate distribut…
▽ More
Large-scale collaborative analysis of brain imaging data, in psychiatry and neu-rology, offers a new source of statistical power to discover features that boost ac-curacy in disease classification, differential diagnosis, and outcome prediction. However, due to data privacy regulations or limited accessibility to large datasets across the world, it is challenging to efficiently integrate distributed information. Here we propose a novel classification framework through multi-site weighted LASSO: each site performs an iterative weighted LASSO for feature selection separately. Within each iteration, the classification result and the selected features are collected to update the weighting parameters for each feature. This new weight is used to guide the LASSO process at the next iteration. Only the fea-tures that help to improve the classification accuracy are preserved. In tests on da-ta from five sites (299 patients with major depressive disorder (MDD) and 258 normal controls), our method boosted classification accuracy for MDD by 4.9% on average. This result shows the potential of the proposed new strategy as an ef-fective and practical collaborative platform for machine learning on large scale distributed imaging and biobank data.
△ Less
Submitted 3 June, 2017; v1 submitted 26 May, 2017;
originally announced May 2017.