Search | arXiv e-print repository

DMF-Net: Image-Guided Point Cloud Completion with Dual-Channel Modality Fusion and Shape-Aware Upsampling Transformer

Authors: Aihua Mao, Yuxuan Tang, Jiangtao Huang, Ying He

Abstract: In this paper we study the task of a single-view image-guided point cloud completion. Existing methods have got promising results by fusing the information of image into point cloud explicitly or implicitly. However, given that the image has global shape information and the partial point cloud has rich local details, We believe that both modalities need to be given equal attention when performing… ▽ More In this paper we study the task of a single-view image-guided point cloud completion. Existing methods have got promising results by fusing the information of image into point cloud explicitly or implicitly. However, given that the image has global shape information and the partial point cloud has rich local details, We believe that both modalities need to be given equal attention when performing modality fusion. To this end, we propose a novel dual-channel modality fusion network for image-guided point cloud completion(named DMF-Net), in a coarse-to-fine manner. In the first stage, DMF-Net takes a partial point cloud and corresponding image as input to recover a coarse point cloud. In the second stage, the coarse point cloud will be upsampled twice with shape-aware upsampling transformer to get the dense and complete point cloud. Extensive quantitative and qualitative experimental results show that DMF-Net outperforms the state-of-the-art unimodal and multimodal point cloud completion works on ShapeNet-ViPC dataset. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.15694 [pdf, other]

doi 10.1007/s11263-024-02141-4

Single-Temporal Supervised Learning for Universal Remote Sensing Change Detection

Authors: Zhuo Zheng, Yanfei Zhong, Ailong Ma, Liangpei Zhang

Abstract: Bitemporal supervised learning paradigm always dominates remote sensing change detection using numerous labeled bitemporal image pairs, especially for high spatial resolution (HSR) remote sensing imagery. However, it is very expensive and labor-intensive to label change regions in large-scale bitemporal HSR remote sensing image pairs. In this paper, we propose single-temporal supervised learning (… ▽ More Bitemporal supervised learning paradigm always dominates remote sensing change detection using numerous labeled bitemporal image pairs, especially for high spatial resolution (HSR) remote sensing imagery. However, it is very expensive and labor-intensive to label change regions in large-scale bitemporal HSR remote sensing image pairs. In this paper, we propose single-temporal supervised learning (STAR) for universal remote sensing change detection from a new perspective of exploiting changes between unpaired images as supervisory signals. STAR enables us to train a high-accuracy change detector only using unpaired labeled images and can generalize to real-world bitemporal image pairs. To demonstrate the flexibility and scalability of STAR, we design a simple yet unified change detector, termed ChangeStar2, capable of addressing binary change detection, object change detection, and semantic change detection in one architecture. ChangeStar2 achieves state-of-the-art performances on eight public remote sensing change detection datasets, covering above two supervised settings, multiple change types, multiple scenarios. The code is available at https://github.com/Z-Zheng/pytorch-change-models. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: IJCV 2024. arXiv admin note: text overlap with arXiv:2108.07002

arXiv:2406.10215 [pdf, other]

DevBench: A multimodal developmental benchmark for language learning

Authors: Alvin Wei Ming Tan, Sunny Yu, Bria Long, Wan**g Anya Ma, Tonya Murray, Rebecca D. Silverman, Jason D. Yeatman, Michael C. Frank

Abstract: How (dis)similar are the learning trajectories of vision-language models and children? Recent modeling work has attempted to understand the gap between models' and humans' data efficiency by constructing models trained on less data, especially multimodal naturalistic data. However, such models are often evaluated on adult-level benchmarks, with limited breadth in language abilities tested, and wit… ▽ More How (dis)similar are the learning trajectories of vision-language models and children? Recent modeling work has attempted to understand the gap between models' and humans' data efficiency by constructing models trained on less data, especially multimodal naturalistic data. However, such models are often evaluated on adult-level benchmarks, with limited breadth in language abilities tested, and without direct comparison to behavioral data. We introduce DevBench, a multimodal benchmark comprising seven language evaluation tasks spanning the domains of lexical, syntactic, and semantic ability, with behavioral data from both children and adults. We evaluate a set of vision-language models on these tasks, comparing models and humans not only on accuracy but on their response patterns. Across tasks, models exhibit variation in their closeness to human response patterns, and models that perform better on a task also more closely resemble human behavioral responses. We also examine the developmental trajectory of OpenCLIP over training, finding that greater training results in closer approximations to adult response patterns. DevBench thus provides a benchmark for comparing models to human language development. These comparisons highlight ways in which model and human language learning processes diverge, providing insight into entry points for improving language models. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2405.07905 [pdf, other]

PLUTO: Pathology-Universal Transformer

Authors: Dinkar Juyal, Harshith Padigela, Chintan Shah, Daniel Shenker, Natalia Harguindeguy, Yi Liu, Blake Martin, Yibo Zhang, Michael Nercessian, Miles Markey, Isaac Finberg, Kelsey Luu, Daniel Borders, Syed Ashar Javed, Emma Krause, Raymond Biju, Aashish Sood, Allen Ma, Jackson Nyman, John Shamshoian, Guillaume Chhor, Darpan Sanghavi, Marc Thibault, Limin Yu, Fedaa Najdawi , et al. (8 additional authors not shown)

Abstract: Pathology is the study of microscopic inspection of tissue, and a pathology diagnosis is often the medical gold standard to diagnose disease. Pathology images provide a unique challenge for computer-vision-based analysis: a single pathology Whole Slide Image (WSI) is gigapixel-sized and often contains hundreds of thousands to millions of objects of interest across multiple resolutions. In this wor… ▽ More Pathology is the study of microscopic inspection of tissue, and a pathology diagnosis is often the medical gold standard to diagnose disease. Pathology images provide a unique challenge for computer-vision-based analysis: a single pathology Whole Slide Image (WSI) is gigapixel-sized and often contains hundreds of thousands to millions of objects of interest across multiple resolutions. In this work, we propose PathoLogy Universal TransfOrmer (PLUTO): a light-weight pathology FM that is pre-trained on a diverse dataset of 195 million image tiles collected from multiple sites and extracts meaningful representations across multiple WSI scales that enable a large variety of downstream pathology tasks. In particular, we design task-specific adaptation heads that utilize PLUTO's output embeddings for tasks which span pathology scales ranging from subcellular to slide-scale, including instance segmentation, tile classification, and slide-level prediction. We compare PLUTO's performance to other state-of-the-art methods on a diverse set of external and internal benchmarks covering multiple biologically relevant tasks, tissue types, resolutions, stains, and scanners. We find that PLUTO matches or outperforms existing task-specific baselines and pathology-specific foundation models, some of which use orders-of-magnitude larger datasets and model sizes when compared to PLUTO. Our findings present a path towards a universal embedding to power pathology image analysis, and motivate further exploration around pathology foundation models in terms of data diversity, architectural improvements, sample efficiency, and practical deployability in real-world applications. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.05968 [pdf, other]

A Universal Growth Rate for Learning with Smooth Surrogate Losses

Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

Abstract: This paper presents a comprehensive analysis of the growth rate of $H$-consistency bounds (and excess error bounds) for various surrogate losses used in classification. We prove a square-root growth rate near zero for smooth margin-based surrogate losses in binary classification, providing both upper and lower bounds under mild assumptions. This result also translates to excess error bounds. Our l… ▽ More This paper presents a comprehensive analysis of the growth rate of $H$-consistency bounds (and excess error bounds) for various surrogate losses used in classification. We prove a square-root growth rate near zero for smooth margin-based surrogate losses in binary classification, providing both upper and lower bounds under mild assumptions. This result also translates to excess error bounds. Our lower bound requires weaker conditions than those in previous work for excess error bounds, and our upper bound is entirely novel. Moreover, we extend this analysis to multi-class classification with a series of novel results, demonstrating a universal square-root growth rate for smooth comp-sum and constrained losses, covering common choices for training neural networks in multi-class classification. Given this universal rate, we turn to the question of choosing among different surrogate losses. We first examine how $H$-consistency bounds vary across surrogates based on the number of classes. Next, ignoring constants and focusing on behavior near zero, we identify minimizability gaps as the key differentiating factor in these bounds. Thus, we thoroughly analyze these gaps, to guide surrogate loss selection, covering: comparisons across different comp-sum losses, conditions where gaps become zero, and general conditions leading to small gaps. Additionally, we demonstrate the key role of minimizability gaps in comparing excess error bounds and $H$-consistency bounds. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2403.19625 [pdf, other]

Top-$k$ Classification and Cardinality-Aware Prediction

Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

Abstract: We present a detailed study of top-$k$ classification, the task of predicting the $k$ most probable classes for an input, extending beyond single-class prediction. We demonstrate that several prevalent surrogate loss functions in multi-class classification, such as comp-sum and constrained losses, are supported by $H$-consistency bounds with respect to the top-$k$ loss. These bounds guarantee cons… ▽ More We present a detailed study of top-$k$ classification, the task of predicting the $k$ most probable classes for an input, extending beyond single-class prediction. We demonstrate that several prevalent surrogate loss functions in multi-class classification, such as comp-sum and constrained losses, are supported by $H$-consistency bounds with respect to the top-$k$ loss. These bounds guarantee consistency in relation to the hypothesis set $H$, providing stronger guarantees than Bayes-consistency due to their non-asymptotic and hypothesis-set specific nature. To address the trade-off between accuracy and cardinality $k$, we further introduce cardinality-aware loss functions through instance-dependent cost-sensitive learning. For these functions, we derive cost-sensitive comp-sum and constrained surrogate losses, establishing their $H$-consistency bounds and Bayes-consistency. Minimizing these losses leads to new cardinality-aware algorithms for top-$k$ classification. We report the results of extensive experiments on CIFAR-100, ImageNet, CIFAR-10, and SVHN datasets demonstrating the effectiveness and benefit of these algorithms. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19494 [pdf, ps, other]

Regression with Multi-Expert Deferral

Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

Abstract: Learning to defer with multiple experts is a framework where the learner can choose to defer the prediction to several experts. While this problem has received significant attention in classification contexts, it presents unique challenges in regression due to the infinite and continuous nature of the label space. In this work, we introduce a novel framework of regression with deferral, which invo… ▽ More Learning to defer with multiple experts is a framework where the learner can choose to defer the prediction to several experts. While this problem has received significant attention in classification contexts, it presents unique challenges in regression due to the infinite and continuous nature of the label space. In this work, we introduce a novel framework of regression with deferral, which involves deferring the prediction to multiple experts. We present a comprehensive analysis for both the single-stage scenario, where there is simultaneous learning of predictor and deferral functions, and the two-stage scenario, which involves a pre-trained predictor with a learned deferral function. We introduce new surrogate loss functions for both scenarios and prove that they are supported by $H$-consistency bounds. These bounds provide consistency guarantees that are stronger than Bayes consistency, as they are non-asymptotic and hypothesis set-specific. Our framework is versatile, applying to multiple experts, accommodating any bounded regression losses, addressing both instance-dependent and label-dependent costs, and supporting both single-stage and two-stage methods. A by-product is that our single-stage formulation includes the recent regression with abstention framework (Cheng et al., 2023) as a special case, where only a single expert, the squared loss and a label-independent cost are considered. Minimizing our proposed loss functions directly leads to novel algorithms for regression with deferral. We report the results of extensive experiments showing the effectiveness of our proposed algorithms. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19480 [pdf, ps, other]

$H$-Consistency Guarantees for Regression

Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

Abstract: We present a detailed study of $H$-consistency bounds for regression. We first present new theorems that generalize the tools previously given to establish $H$-consistency bounds. This generalization proves essential for analyzing $H$-consistency bounds specific to regression. Next, we prove a series of novel $H$-consistency bounds for surrogate loss functions of the squared loss, under the assump… ▽ More We present a detailed study of $H$-consistency bounds for regression. We first present new theorems that generalize the tools previously given to establish $H$-consistency bounds. This generalization proves essential for analyzing $H$-consistency bounds specific to regression. Next, we prove a series of novel $H$-consistency bounds for surrogate loss functions of the squared loss, under the assumption of a symmetric distribution and a bounded hypothesis set. This includes positive results for the Huber loss, all $\ell_p$ losses, $p \geq 1$, the squared $ε$-insensitive loss, as well as a negative result for the $ε$-insensitive loss used in squared Support Vector Regression (SVR). We further leverage our analysis of $H$-consistency for regression and derive principled surrogate losses for adversarial regression (Section 5). This readily establishes novel algorithms for adversarial regression, for which we report favorable experimental results in Section 6. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.13500 [pdf, other]

doi 10.1051/0004-6361/202348993

The Galactic latitude dependency of Faraday complexity in the S-PASS/ATCA RM catalogue

Authors: S. Ranchod, S. A. Mao, R. Deane, S. S. Sridhar, A. Damas-Segovia, J. D. Livingston, Y. K. Ma

Abstract: The S-band Polarisation All Sky Survey (SPASS/ATCA) rotation measure (RM) catalogue is the largest broadband RM catalogue to date, increasing the RM density in the sparse southern sky. Through analysis of this catalogue, we report a latitude dependency of the Faraday complexity of polarised sources in this catalogue within 10$^\circ$ of the Galactic plane towards the inner Galaxy. In this study, w… ▽ More The S-band Polarisation All Sky Survey (SPASS/ATCA) rotation measure (RM) catalogue is the largest broadband RM catalogue to date, increasing the RM density in the sparse southern sky. Through analysis of this catalogue, we report a latitude dependency of the Faraday complexity of polarised sources in this catalogue within 10$^\circ$ of the Galactic plane towards the inner Galaxy. In this study, we aim to investigate this trend with follow-up observations using the Australia Telescope Compact Array (ATCA). We observe 95 polarised sources from the SPASS/ATCA RM catalogue at 1.1 - 3.1 GHz with ATCA's 6 km configuration. We present Stokes QU fitting results and a comparative analysis with the SPASS/ATCA catalogue. We find an overall decrease in complexity in these sources with the higher angular resolution observations, with a complexity fraction of 42\%, establishing that the majority of the complexity in the SPASS/ATCA sample is due to the mixing-in of diffuse Galactic emission at scales $θ> 2.8'$. Furthermore, we find a correlation between our observed small-scale complexity $θ< 2.8'$ and the Galactic spiral arms, which we interpret to be due to Galactic turbulence or small-scale polarised emission. These results emphasise the importance of considering the maximum angular scale to which the observations are sensitive in the classification of Faraday complexity; the effect of which can be more carefully investigated with SKA-precursor and pathfinder arrays (e.g. MeerKAT and ASKAP). △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 16 pages, 16 figures

Journal ref: A&A 686, A104 (2024)

arXiv:2403.08348 [pdf, other]

A programmable topological photonic chip

Authors: Tianxiang Dai, Anqi Ma, Jun Mao, Yutian Ao, Xinyu Jia, Yun Zheng, Chonghao Zhai, Yan Yang, Zhihua Li, Bo Tang, Jun Luo, Baile Zhang, Xiaoyong Hu, Qihuang Gong, Jianwei Wang

Abstract: Controlling topological phases of light has allowed experimental observations of abundant topological phenomena and development of robust photonic devices. The prospect of more sophisticated controls with topological photonic devices for practical implementations requires high-level programmability. Here, we demonstrate a fully programmable topological photonic chip with large-scale integration of… ▽ More Controlling topological phases of light has allowed experimental observations of abundant topological phenomena and development of robust photonic devices. The prospect of more sophisticated controls with topological photonic devices for practical implementations requires high-level programmability. Here, we demonstrate a fully programmable topological photonic chip with large-scale integration of silicon photonic nanocircuits and microresonators. Photonic artificial atoms and their interactions in our compound system can be individually addressed and controlled, therefore allowing arbitrary altering of structural parameters and geometrical configurations for the observations of dynamic topological phase transitions and diverse photonic topological insulators. By individually programming artificial atoms on the generic chip, it has allowed comprehensive statistic characterisations of topological robustness against relatively weak disorders, as well as counterintuitive topological Anderson phase transitions induced by strong disorders. Our generic topological photonic chip that can be rapidly reprogrammed to implement multifunctionalities, prototypes a flexible and versatile platform for possible applications across fundamental science and topological technologies. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08279 [pdf, ps, other]

On the conservation laws and the structure of the nonlinearity for SQG and its generalizations

Authors: Philip Isett, Andrew Ma

Abstract: Using a new definition for the nonlinear term, we prove that all weak solutions to the SQG equation (and mSQG) conserve the angular momentum. This result is new for the weak solutions of [Resnick, '95] and rules out the possibility of anomalous dissipation of angular momentum. We also prove conservation of the Hamiltonian under conjecturally optimal assumptions, sharpening a well-known criterion o… ▽ More Using a new definition for the nonlinear term, we prove that all weak solutions to the SQG equation (and mSQG) conserve the angular momentum. This result is new for the weak solutions of [Resnick, '95] and rules out the possibility of anomalous dissipation of angular momentum. We also prove conservation of the Hamiltonian under conjecturally optimal assumptions, sharpening a well-known criterion of [Cheskidov-Constantin-Friedlander-Shvydkoy, '08]. Moreover, we show that our new estimate for the nonlinearity is optimal and that it characterizes the mSQG nonlinearity uniquely among active scalar nonlinearities with a scaling symmetry. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.07499 [pdf, other]

doi 10.1103/PhysRevD.109.116009

Low-mass enhancement of kaon pairs in $B^+\to\bar{D}^{(*)0}K^+\bar{K}^0$ and $B^0\to D^{(*)-}K^+\bar{K}^0$ decays

Authors: Wen-Fei Wang, Li-Fei Yang, Ai-Jun Ma, Àngels Ramos

Abstract: Very recently, the Belle~II Collaboration presented a measurement for the decays $B^+\to\bar{D}^{(*)0} K^+\bar{K}^0$ and $B^0\to D^{(*)-}K^+\bar{K}^0$, the bulk of observed $m(K^+ K_S^0)$ distributions showing low-mass structures in all four channels. In this work, we study the contributions of $ρ(770,1450)^+$, $a_2(1320)^+$ and $a_0(980,1450)^+$ resonances to these decay processes. The intermedia… ▽ More Very recently, the Belle~II Collaboration presented a measurement for the decays $B^+\to\bar{D}^{(*)0} K^+\bar{K}^0$ and $B^0\to D^{(*)-}K^+\bar{K}^0$, the bulk of observed $m(K^+ K_S^0)$ distributions showing low-mass structures in all four channels. In this work, we study the contributions of $ρ(770,1450)^+$, $a_2(1320)^+$ and $a_0(980,1450)^+$ resonances to these decay processes. The intermediate states $ρ(770,1450)^+$ are found to dominate the low-mass distribution of kaon pairs roughly contributing to half of the total branching fraction in each of the four decay channels. The contribution of the tensor $a_2(1320)^+$ meson is found to be negligible. Near the threshold of the kaon pair, the state $a_0(980)^+$ turns out to be much less important than expected, not being able to account for the enhancement of events in that energy region observed in the $B^+\to\bar{D}^{(*)0} K^+\bar{K}^0$ decays. Further studies both from the theoretical and experimental sides are needed to elucidate the role of the non-resonant contributions governing the formation of $K^+\bar{K}^0$ pairs near their threshold in these decay processes. △ Less

Submitted 11 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: 18 pages, 4 figures

Journal ref: Phys.Rev.D 109, 116009(2024)

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.01202 [pdf, other]

Atacama Large Aperture Submillimeter Telescope (AtLAST) science: Gas and dust in nearby galaxies

Authors: Daizhong Liu, Amelie Saintonge, Caroline Bot, Francisca Kemper, Enrique Lopez-Rodriguez, Matthew W. L. Smith, Thomas Stanke, Paola Andreani, Alessandro Boselli, Claudia Cicone, Timothy A. Davis, Bendix Hagedorn, Akhil Lasrado, Ann Mao, Serena Viti, Mark Booth, Pamela Klaassen, Tony Mroczkowski, Frank Bigiel, Melanie Chevance, Martin A. Cordiner, Luca Di Mascolo, Doug Johnstone, Minju M. Lee, Thomas Maccarone , et al. (3 additional authors not shown)

Abstract: Understanding the physical processes that regulate star formation and galaxy evolution are major areas of activity in modern astrophysics. Nearby galaxies offer unique opportunities to inspect interstellar medium (ISM), star formation (SF), radiative, dynamic and magnetic physics in great detail from sub-galactic (kpc) scales to sub-cloud (sub-pc) scales, from quiescent galaxies to starbursts, and… ▽ More Understanding the physical processes that regulate star formation and galaxy evolution are major areas of activity in modern astrophysics. Nearby galaxies offer unique opportunities to inspect interstellar medium (ISM), star formation (SF), radiative, dynamic and magnetic physics in great detail from sub-galactic (kpc) scales to sub-cloud (sub-pc) scales, from quiescent galaxies to starbursts, and from field galaxies to overdensities. In this case study, we discuss the major breakthroughs in this area of research that will be enabled by the Atacama Large Aperture Submillimeter Telescope (AtLAST), a proposed 50-m single-dish submillimeter telescope. The new discovery space of AtLAST comes from its exceptional sensitivity, in particular to extended low surface brightness emission, a very large 2 degree field of view, and correspondingly high map** efficiency. This paper focuses on four themes which will particularly benefit from AtLAST: 1) the LMC and SMC, 2) extragalactic magnetic fields, 3) the physics and chemistry of the interstellar medium, and 4) star formation and galaxy evolution. With ~1000-2000h surveys each, AtLAST could deliver deep dust continuum maps of the entire LMC and SMC fields at parsec-scale resolution, high-resolution maps of the magnetic field structure, gas density, temperature and composition of the dense and diffuse ISM in ~100 nearby galaxies, as well as the first large-scale blind CO survey in the nearby Universe, delivering molecular gas masses for up to 10^6 galaxies (3 orders of magnitude more than current samples). Through such observing campaigns, AtLAST will have a profound impact on our understanding of the baryon cycle and star formation across a wide range of environments. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 29 pages, 11 figues, submitted to Open Research Europe as part of the AtLAST collection: https://open-research-europe.ec.europa.eu/collections/atlast/about

arXiv:2403.00892 [pdf, other]

PowerFlowMultiNet: Multigraph Neural Networks for Unbalanced Three-Phase Distribution Systems

Authors: Salah Ghamizi, Jun Cao, Aoxiang Ma, Pedro Rodriguez

Abstract: Efficiently solving unbalanced three-phase power flow in distribution grids is pivotal for grid analysis and simulation. There is a pressing need for scalable algorithms capable of handling large-scale unbalanced power grids that can provide accurate and fast solutions. To address this, deep learning techniques, especially Graph Neural Networks (GNNs), have emerged. However, existing literature pr… ▽ More Efficiently solving unbalanced three-phase power flow in distribution grids is pivotal for grid analysis and simulation. There is a pressing need for scalable algorithms capable of handling large-scale unbalanced power grids that can provide accurate and fast solutions. To address this, deep learning techniques, especially Graph Neural Networks (GNNs), have emerged. However, existing literature primarily focuses on balanced networks, leaving a critical gap in supporting unbalanced three-phase power grids. This letter introduces PowerFlowMultiNet, a novel multigraph GNN framework explicitly designed for unbalanced three-phase power grids. The proposed approach models each phase separately in a multigraph representation, effectively capturing the inherent asymmetry in unbalanced grids. A graph embedding mechanism utilizing message passing is introduced to capture spatial dependencies within the power system network. PowerFlowMultiNet outperforms traditional methods and other deep learning approaches in terms of accuracy and computational speed. Rigorous testing reveals significantly lower error rates and a notable hundredfold increase in computational speed for large power networks compared to model-based methods. △ Less

Submitted 12 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.18078 [pdf, other]

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

Authors: Yanzuo Lu, Manlin Zhang, Andy J Ma, Xiaohua Xie, Jian-Huang Lai

Abstract: Diffusion model is a promising approach to image generation and has been employed for Pose-Guided Person Image Synthesis (PGPIS) with competitive performance. While existing methods simply align the person appearance to the target pose, they are prone to overfitting due to the lack of a high-level semantic understanding on the source person image. In this paper, we propose a novel Coarse-to-Fine L… ▽ More Diffusion model is a promising approach to image generation and has been employed for Pose-Guided Person Image Synthesis (PGPIS) with competitive performance. While existing methods simply align the person appearance to the target pose, they are prone to overfitting due to the lack of a high-level semantic understanding on the source person image. In this paper, we propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for PGPIS. In the absence of image-caption pairs and textual prompts, we develop a novel training paradigm purely based on images to control the generation process of a pre-trained text-to-image diffusion model. A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt. This allows for the decoupling of fine-grained appearance and pose information controls at different stages, and thus circumventing the potential overfitting problem. To generate more realistic texture details, a hybrid-granularity attention module is proposed to encode multi-scale fine-grained appearance features as bias terms to augment the coarse-grained prompt. Both quantitative and qualitative experimental results on the DeepFashion benchmark demonstrate the superiority of our method over the state of the arts for PGPIS. Code is available at https://github.com/YanzuoLu/CFLD. △ Less

Submitted 9 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: Accepted by CVPR 2024 (Highlight)

arXiv:2402.13786 [pdf, ps, other]

Degree conditions for disjoint path covers in digraphs

Authors: Ansong Ma, Yuefang Sun

Abstract: In this paper, we study degree conditions for three types of disjoint directed path cover problems: many-to-many $k$-DDPC, one-to-many $k$-DDPC and one-to-one $k$-DDPC, which are intimately connected to other famous topics in graph theory, such as Hamiltonicity and $k$-linkage, and have a strong background of applications. Firstly, we get two sharp minimum semi-degree sufficient conditions for t… ▽ More In this paper, we study degree conditions for three types of disjoint directed path cover problems: many-to-many $k$-DDPC, one-to-many $k$-DDPC and one-to-one $k$-DDPC, which are intimately connected to other famous topics in graph theory, such as Hamiltonicity and $k$-linkage, and have a strong background of applications. Firstly, we get two sharp minimum semi-degree sufficient conditions for the unpaired many-to-many $k$-DDPC problem and a sharp Ore-type degree condition for the paired many-to-many $2$-DDPC problem. Secondly, we obtain a minimum semi-degree sufficient condition for the one-to-many $k$-DDPC problem on a digraph with order $n$, and show that the bound for the minimum semi-degree is sharp when $n+k$ is even and is sharp up to an additive constant 1 otherwise. Finally, we give a minimum semi-degree sufficient condition for the one-to-one $k$-DDPC problem on a digraph with order $n$, and show that the bound for the minimum semi-degree is sharp when $n+k$ is odd and is sharp up to an additive constant 1 otherwise. △ Less

Submitted 28 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.12474 [pdf, other]

CGOLS V: Disk-wide Stellar Feedback and Observational Implications of the Cholla Galactic Wind Model

Authors: Evan E. Schneider, S. Alwin Mao

Abstract: We present the fifth simulation in the CGOLS project -- a set of isolated starburst galaxy simulations modeled over large scales ($10\kpc$) at uniformly high resolution ($Δx \approx 5\pc$). Supernova feedback in this simulation is implemented as a disk-wide distribution of clusters, and we assess the impact of this geometry on several features of the resulting outflow, including radial profiles of… ▽ More We present the fifth simulation in the CGOLS project -- a set of isolated starburst galaxy simulations modeled over large scales ($10\kpc$) at uniformly high resolution ($Δx \approx 5\pc$). Supernova feedback in this simulation is implemented as a disk-wide distribution of clusters, and we assess the impact of this geometry on several features of the resulting outflow, including radial profiles of various phases; mass, momentum, and energy outflow rates; covering fraction of cool gas; mock absorption-line spectra; and X-ray surface brightness. In general, we find that the outflow generated by this model is cooler, slower, and contains more mass in the cool phase than a more centrally concentrated outflow driven by a similar number of supernovae. In addition, the energy loading factors in the hot phase are an order-of-magnitude lower, indicating much larger losses due to radiative cooling in the outflow. However, coupling between the hot and cool phases is more efficient than in the nuclear burst case, with almost 50\% of the total outflowing energy flux carried by the cool phase at a radial distance of 5 kpc. These physical differences have corresponding signatures in observable quantities: the covering fraction of cool gas is much larger, and there is greater evidence of absorption in low and intermediate ionization-energy lines. Taken together, our simulations indicate that centrally-concentrated starbursts are more effective at driving hot, low-density outflows that will expand far into the halo, while galaxy-wide bursts may be more effective at removing cool gas from the disk. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 22 pages, 13 figures, accepted in ApJ

arXiv:2402.10434 [pdf, other]

Parametric Augmentation for Time Series Contrastive Learning

Authors: Xu Zheng, Tianchun Wang, Wei Cheng, Aitian Ma, Haifeng Chen, Mo Sha, Dongsheng Luo

Abstract: Modern techniques like contrastive learning have been effectively used in many areas, including computer vision, natural language processing, and graph-structured data. Creating positive examples that assist the model in learning robust and discriminative representations is a crucial stage in contrastive learning approaches. Usually, preset human intuition directs the selection of relevant data au… ▽ More Modern techniques like contrastive learning have been effectively used in many areas, including computer vision, natural language processing, and graph-structured data. Creating positive examples that assist the model in learning robust and discriminative representations is a crucial stage in contrastive learning approaches. Usually, preset human intuition directs the selection of relevant data augmentations. Due to patterns that are easily recognized by humans, this rule of thumb works well in the vision and language domains. However, it is impractical to visually inspect the temporal structures in time series. The diversity of time series augmentations at both the dataset and instance levels makes it difficult to choose meaningful augmentations on the fly. In this study, we address this gap by analyzing time series data augmentation using information theory and summarizing the most commonly adopted augmentations in a unified format. We then propose a contrastive learning framework with parametric augmentation, AutoTCL, which can be adaptively employed to support time series representation learning. The proposed approach is encoder-agnostic, allowing it to be seamlessly integrated with different backbone encoders. Experiments on univariate forecasting tasks demonstrate the highly competitive results of our method, with an average 6.5\% reduction in MSE and 4.7\% in MAE over the leading baselines. In classification tasks, AutoTCL achieves a $1.2\%$ increase in average accuracy. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: Accepted by International Conference on Learning Representations (ICLR 2024)

arXiv:2401.16450 [pdf, other]

ACCESS: Prompt Engineering for Automated Web Accessibility Violation Corrections

Authors: Calista Huang, Alyssa Ma, Suchir Vyasamudri, Eugenie Puype, Sayem Kamal, Juan Belza Garcia, Salar Cheema, Michael Lutz

Abstract: With the increasing need for inclusive and user-friendly technology, web accessibility is crucial to ensuring equal access to online content for individuals with disabilities, including visual, auditory, cognitive, or motor impairments. Despite the existence of accessibility guidelines and standards such as Web Content Accessibility Guidelines (WCAG) and the Web Accessibility Initiative (W3C), ove… ▽ More With the increasing need for inclusive and user-friendly technology, web accessibility is crucial to ensuring equal access to online content for individuals with disabilities, including visual, auditory, cognitive, or motor impairments. Despite the existence of accessibility guidelines and standards such as Web Content Accessibility Guidelines (WCAG) and the Web Accessibility Initiative (W3C), over 90% of websites still fail to meet the necessary accessibility requirements. For web users with disabilities, there exists a need for a tool to automatically fix web page accessibility errors. While research has demonstrated methods to find and target accessibility errors, no research has focused on effectively correcting such violations. This paper presents a novel approach to correcting accessibility violations on the web by modifying the document object model (DOM) in real time with foundation models. Leveraging accessibility error information, large language models (LLMs), and prompt engineering techniques, we achieved greater than a 51% reduction in accessibility violation errors after corrections on our novel benchmark: ACCESS. Our work demonstrates a valuable approach toward the direction of inclusive web content, and provides directions for future research to explore advanced methods to automate web accessibility. △ Less

Submitted 10 February, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

Comments: 11 pages, 6 figures

arXiv:2401.16348 [pdf, other]

Improving the TENOR of Labeling: Re-evaluating Topic Models for Content Analysis

Authors: Zongxia Li, Andrew Mao, Daniel Stephens, Pranav Goel, Emily Walpole, Alden Dima, Juan Fung, Jordan Boyd-Graber

Abstract: Topic models are a popular tool for understanding text collections, but their evaluation has been a point of contention. Automated evaluation metrics such as coherence are often used, however, their validity has been questioned for neural topic models (NTMs) and can overlook a models benefits in real world applications. To this end, we conduct the first evaluation of neural, supervised and classic… ▽ More Topic models are a popular tool for understanding text collections, but their evaluation has been a point of contention. Automated evaluation metrics such as coherence are often used, however, their validity has been questioned for neural topic models (NTMs) and can overlook a models benefits in real world applications. To this end, we conduct the first evaluation of neural, supervised and classical topic models in an interactive task based setting. We combine topic models with a classifier and test their ability to help humans conduct content analysis and document annotation. From simulated, real user and expert pilot studies, the Contextual Neural Topic Model does the best on cluster evaluation metrics and human evaluations; however, LDA is competitive with two other NTMs under our simulated experiment and user study results, contrary to what coherence scores suggest. We show that current automated metrics do not provide a complete picture of topic modeling capabilities, but the right choice of NTMs can be better than classical models on practical task. △ Less

Submitted 19 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: 19 pages, 5 tables, 6 figures, Accepted to EACL Main Conference 2024

arXiv:2401.02892 [pdf, other]

Rational Approximation of Golden Angles: Accelerated Reconstructions with Simple and Numerically Reproducible Radial Sampling

Authors: Nick Scholand, Philip Schaten, Christina Graf, Daniel Mackner, H. Christian M. Holme, Moritz Blumenthal, Andrew Mao, Jakob Assländer, Martin Uecker

Abstract: Purpose: To develop a generic radial sampling scheme that combines the advantages of golden ratio sampling with simplicity of equidistant angular patterns. The irrational angle between consecutive spokes in golden ratio based sampling schemes enables a flexible retrospective choice of temporal resolution, while preserving good coverage of k-space for each individual bin. Nevertheless, irrational i… ▽ More Purpose: To develop a generic radial sampling scheme that combines the advantages of golden ratio sampling with simplicity of equidistant angular patterns. The irrational angle between consecutive spokes in golden ratio based sampling schemes enables a flexible retrospective choice of temporal resolution, while preserving good coverage of k-space for each individual bin. Nevertheless, irrational increments prohibit precomputation of the point-spread function (PSF), can lead to numerical problems, and require more complex processing steps. To avoid these problems, a new sampling scheme based on a rational approximation of golden angles (RAGA) is developed. Methods: The theoretical properties of RAGA sampling are mathematically derived. Sidelobe-to-peak ratios (SPR) are numerically computed and compared to the corresponding golden ratio sampling schemes. The sampling scheme is implemented in the BART toolbox and in a radial gradient-echo sequence. Feasibility is shown for quantitative imaging in a phantom and a cardiac scan of a healthy volunteer. Results: RAGA sampling can accurately approximate golden ratio sampling and has almost identical PSF and SPR. In contrast to golden ratio sampling, each frame can be reconstructed with the same equidistant trajectory using different sampling masks, and the angle of each acquired spoke can be encoded as a small index, which simplifies processing of the acquired data. Conclusion: RAGA sampling provides the advantages of golden ratio sampling while simplifying data processing, rendering it a valuable tool for dynamic and quantitative MRI. △ Less

Submitted 30 May, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: 27 pages, 7 figures, 3 tables

arXiv:2312.12246 [pdf, other]

MDD-UNet: Domain Adaptation for Medical Image Segmentation with Theoretical Guarantees, a Proof of Concept

Authors: Asbjørn Munk, Ao Ma, Mads Nielsen

Abstract: The current state-of-the art techniques for image segmentation are often based on U-Net architectures, a U-shaped encoder-decoder networks with skip connections. Despite the powerful performance, the architecture often does not perform well when used on data which has different characteristics than the data it was trained on. Many techniques for improving performance in the presence of domain shif… ▽ More The current state-of-the art techniques for image segmentation are often based on U-Net architectures, a U-shaped encoder-decoder networks with skip connections. Despite the powerful performance, the architecture often does not perform well when used on data which has different characteristics than the data it was trained on. Many techniques for improving performance in the presence of domain shift have been developed, however typically only have loose connections to the theory of domain adaption. In this work, we propose an unsupervised domain adaptation framework for U-Nets with theoretical guarantees based on the Margin Disparity Discrepancy [1] called the MDD-UNet. We evaluate the proposed technique on the task of hippocampus segmentation, and find that the MDD-UNet is able to learn features which are domain-invariant with no knowledge about the labels in the target domain. The MDD-UNet improves performance over the standard U-Net on 11 out of 12 combinations of datasets. This work serves as a proof of concept by demonstrating an improvement on the U-Net in it's standard form without modern enhancements, which opens up a new avenue of studying domain adaptation for models with very large hypothesis spaces from both methodological and practical perspectives. Code is available at https://github.com/asbjrnmunk/mdd-unet. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Published at NLDL 2024

arXiv:2312.12222 [pdf, other]

EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering

Authors: Junjue Wang, Zhuo Zheng, Zihang Chen, Ailong Ma, Yanfei Zhong

Abstract: Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multi-modal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images,… ▽ More Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multi-modal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images, corresponding semantic masks, and 208,593 QA pairs with urban and rural governance requirements embedded. As objects are the basis for complex relational reasoning, we propose a Semantic OBject Awareness framework (SOBA) to advance VQA in an object-centric way. To preserve refined spatial locations and semantics, SOBA leverages a segmentation network for object semantics generation. The object-guided attention aggregates object interior features via pseudo masks, and bidirectional cross-attention further models object external relations hierarchically. To optimize object counting, we propose a numerical difference loss that dynamically adds difference penalties, unifying the classification and regression tasks. Experimental results show that SOBA outperforms both advanced general and remote sensing methods. We believe this dataset and framework provide a strong benchmark for Earth vision's complex analysis. The project page is at https://Junjue-Wang.github.io/homepage/EarthVQA. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted By AAAI 2024

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.11468 [pdf, other]

doi 10.1002/mrm.30135

Bias-Reduced Neural Networks for Parameter Estimation in Quantitative MRI

Authors: Andrew Mao, Sebastian Flassbeck, Jakob Assländer

Abstract: Purpose: To develop neural network (NN)-based quantitative MRI parameter estimators with minimal bias and a variance close to the Cramér-Rao bound. Theory and Methods: We generalize the mean squared error loss to control the bias and variance of the NN's estimates, which involves averaging over multiple noise realizations of the same measurements during training. Bias and variance properties of… ▽ More Purpose: To develop neural network (NN)-based quantitative MRI parameter estimators with minimal bias and a variance close to the Cramér-Rao bound. Theory and Methods: We generalize the mean squared error loss to control the bias and variance of the NN's estimates, which involves averaging over multiple noise realizations of the same measurements during training. Bias and variance properties of the resulting NNs are studied for two neuroimaging applications. Results: In simulations, the proposed strategy reduces the estimates' bias throughout parameter space and achieves a variance close to the Cramér-Rao bound. In vivo, we observe good concordance between parameter maps estimated with the proposed NNs and traditional estimators, such as non-linear least-squares fitting, while state-of-the-art NNs show larger deviations. Conclusion: The proposed NNs have greatly reduced bias compared to those trained using the mean squared error and offer significantly improved computational efficiency over traditional estimators with comparable or better accuracy. △ Less

Submitted 10 April, 2024; v1 submitted 13 November, 2023; originally announced December 2023.

arXiv:2312.07871 [pdf, other]

MLNet: Mutual Learning Network with Neighborhood Invariance for Universal Domain Adaptation

Authors: Yanzuo Lu, Meng Shen, Andy J Ma, Xiaohua Xie, Jian-Huang Lai

Abstract: Universal domain adaptation (UniDA) is a practical but challenging problem, in which information about the relation between the source and the target domains is not given for knowledge transfer. Existing UniDA methods may suffer from the problems of overlooking intra-domain variations in the target domain and difficulty in separating between the similar known and unknown class. To address these is… ▽ More Universal domain adaptation (UniDA) is a practical but challenging problem, in which information about the relation between the source and the target domains is not given for knowledge transfer. Existing UniDA methods may suffer from the problems of overlooking intra-domain variations in the target domain and difficulty in separating between the similar known and unknown class. To address these issues, we propose a novel Mutual Learning Network (MLNet) with neighborhood invariance for UniDA. In our method, confidence-guided invariant feature learning with self-adaptive neighbor selection is designed to reduce the intra-domain variations for more generalizable feature representation. By using the cross-domain mixup scheme for better unknown-class identification, the proposed method compensates for the misidentified known-class errors by mutual learning between the closed-set and open-set classifiers. Extensive experiments on three publicly available benchmarks demonstrate that our method achieves the best results compared to the state-of-the-arts in most cases and significantly outperforms the baseline across all the four settings in UniDA. Code is available at https://github.com/YanzuoLu/MLNet. △ Less

Submitted 27 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024 (Poster)

arXiv:2312.05619 [pdf, ps, other]

doi 10.1103/PhysRevD.109.056017

Contributions of the subprocess $K^*_0(1430) \to Kη^{\prime}$ in the charmless three-body $B$ meson decays

Authors: Ai-Jun Ma, Wen-Fei Wang

Abstract: We study the contributions for $Kη^{\prime}$ pair originating from the scalar intermediate state $K_0^{*}(1430)$ in the three-body decays $B\to Kη^{\prime} h$ ($h=π, K$) within the perturbative QCD approach. The contribution of $K^*_0(1430)\to Kη^{\prime}$ is described by the Flatt${\rm \acute{e}}$ formula with coupled channels $Kπ$, $Kη$ and $Kη^{\prime}$. The strong coupling constants… ▽ More We study the contributions for $Kη^{\prime}$ pair originating from the scalar intermediate state $K_0^{*}(1430)$ in the three-body decays $B\to Kη^{\prime} h$ ($h=π, K$) within the perturbative QCD approach. The contribution of $K^*_0(1430)\to Kη^{\prime}$ is described by the Flatt${\rm \acute{e}}$ formula with coupled channels $Kπ$, $Kη$ and $Kη^{\prime}$. The strong coupling constants $g_{K^*_0Kη^{(\prime)}}$ are extracted from $g_{K^*_0 Kπ}$ within flavor SU$(3)$ symmetry. In spite of the strong depression by phase space near the threshold of $Kη^\prime$, the $CP$ averaged branching fractions for the $B\to K^*_0(1430) h \to Kη^\prime h$ decays are predicted to be on the order of $10^{-8}$ to $10^{-5}$, which are non-negligible for the corresponding three-body $B$ decays. Since the K$η$ system is almost decoupled from the even-spin strange mesons under flavor SU$(3)$ symmetry, those quasi-two-body $B$ decays with subprocess $K^*_0(1430) \to K η$ shall have quite small branching ratios and are not taken into account in this work. We also estimate that the branching fraction for $K_0^{*}(1430)\to Kη^{\prime}$ is about one fifth of that for $K_0^{*}(1430)\to Kπ$. The predictions for the relevant decays are expected to be tested by the LHCb and Belle-II experiments in the future. △ Less

Submitted 20 March, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

Comments: 10 pages, 2 figures and 4 tables. Matching the published version in PRD

Journal ref: Phys. Rev. D 109, 056017 (2024)

arXiv:2312.00111 [pdf, other]

Multimodal Learning for Materials

Authors: Viggo Moro, Charlotte Loh, Rumen Dangovski, Ali Ghorashi, Andrew Ma, Zhuo Chen, Samuel Kim, Peter Y. Lu, Thomas Christensen, Marin Soljačić

Abstract: Artificial intelligence is transforming computational materials science, improving the prediction of material properties, and accelerating the discovery of novel materials. Recently, publicly available material data repositories have grown rapidly. This growth encompasses not only more materials, but also a greater variety and quantity of their associated properties. Existing machine learning effo… ▽ More Artificial intelligence is transforming computational materials science, improving the prediction of material properties, and accelerating the discovery of novel materials. Recently, publicly available material data repositories have grown rapidly. This growth encompasses not only more materials, but also a greater variety and quantity of their associated properties. Existing machine learning efforts in materials science focus primarily on single-modality tasks, i.e., relationships between materials and a single physical property, thus not taking advantage of the rich and multimodal set of material properties. Here, we introduce Multimodal Learning for Materials (MultiMat), which enables self-supervised multi-modality training of foundation models for materials. We demonstrate our framework's potential using data from the Materials Project database on multiple axes: (i) MultiMat achieves state-of-the-art performance for challenging material property prediction tasks; (ii) MultiMat enables novel and accurate material discovery via latent space similarity, enabling screening for stable materials with desired properties; and (iii) MultiMat encodes interpretable emergent features that may provide novel scientific insights. △ Less

Submitted 12 April, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

Comments: 11 pages, 4 figures

arXiv:2311.18495 [pdf, other]

Improving Adversarial Transferability via Model Alignment

Authors: Avery Ma, Amir-massoud Farahmand, Yangchen Pan, Philip Torr, **dong Gu

Abstract: Neural networks are susceptible to adversarial perturbations that are transferable across different models. In this paper, we introduce a novel model alignment technique aimed at improving a given source model's ability in generating transferable adversarial perturbations. During the alignment process, the parameters of the source model are fine-tuned to minimize an alignment loss. This loss measu… ▽ More Neural networks are susceptible to adversarial perturbations that are transferable across different models. In this paper, we introduce a novel model alignment technique aimed at improving a given source model's ability in generating transferable adversarial perturbations. During the alignment process, the parameters of the source model are fine-tuned to minimize an alignment loss. This loss measures the divergence in the predictions between the source model and another, independently trained model, referred to as the witness model. To understand the effect of model alignment, we conduct a geometric anlaysis of the resulting changes in the loss landscape. Extensive experiments on the ImageNet dataset, using a variety of model architectures, demonstrate that perturbations generated from aligned source models exhibit significantly higher transferability than those from the original source model. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.10266 [pdf, other]

Diagnosing and Debiasing Corpus-Based Political Bias and Insults in GPT2

Authors: Ambri Ma, Arnav Kumar, Brett Zeligson

Abstract: The training of large language models (LLMs) on extensive, unfiltered corpora sourced from the internet is a common and advantageous practice. Consequently, LLMs have learned and inadvertently reproduced various types of biases, including violent, offensive, and toxic language. However, recent research shows that generative pretrained transformer (GPT) language models can recognize their own biase… ▽ More The training of large language models (LLMs) on extensive, unfiltered corpora sourced from the internet is a common and advantageous practice. Consequently, LLMs have learned and inadvertently reproduced various types of biases, including violent, offensive, and toxic language. However, recent research shows that generative pretrained transformer (GPT) language models can recognize their own biases and detect toxicity in generated content, a process referred to as self-diagnosis. In response, researchers have developed a decoding algorithm that allows LLMs to self-debias, or reduce their likelihood of generating harmful text. This study investigates the efficacy of the diagnosing-debiasing approach in mitigating two additional types of biases: insults and political bias. These biases are often used interchangeably in discourse, despite exhibiting potentially dissimilar semantic and syntactic properties. We aim to contribute to the ongoing effort of investigating the ethical and social implications of human-AI interaction. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 9 pages

arXiv:2311.02762

Fast Sparse 3D Convolution Network with VDB

Authors: Fangjun Zhou, Anyong Mao, Eftychios Sifakis

Abstract: We proposed a new Convolution Neural Network implementation optimized for sparse 3D data inference. This implementation uses NanoVDB as the data structure to store the sparse tensor. It leaves a relatively small memory footprint while maintaining high performance. We demonstrate that this architecture is around 20 times faster than the state-of-the-art dense CNN model on a high-resolution 3D objec… ▽ More We proposed a new Convolution Neural Network implementation optimized for sparse 3D data inference. This implementation uses NanoVDB as the data structure to store the sparse tensor. It leaves a relatively small memory footprint while maintaining high performance. We demonstrate that this architecture is around 20 times faster than the state-of-the-art dense CNN model on a high-resolution 3D object classification network. △ Less

Submitted 14 November, 2023; v1 submitted 5 November, 2023; originally announced November 2023.

Comments: Unauthorized publication

arXiv:2311.00413 [pdf, other]

doi 10.1007/JHEP01(2024)047

The $ρ(770,1450)\to ωπ$ contributions for three-body decays $B\to\bar{D}^{(*)} ωπ$

Authors: Yu-Shan Ren, Ai-Jun Ma, Wen-Fei Wang

Abstract: The decays $B\to\bar{D}^{(*)} ωπ$ are very important for the investigation of $ρ$ excitations and the test of factorization hypothesis for $B$ meson decays. The $B^{+}\to \bar{D}^{(*)0}ωπ^+$ and $B^{0}\to D^{(*)-}ωπ^+$ have been measured by different collaborations but without any predictions for their observables on theoretical side. In this work, we study the contributions of… ▽ More The decays $B\to\bar{D}^{(*)} ωπ$ are very important for the investigation of $ρ$ excitations and the test of factorization hypothesis for $B$ meson decays. The $B^{+}\to \bar{D}^{(*)0}ωπ^+$ and $B^{0}\to D^{(*)-}ωπ^+$ have been measured by different collaborations but without any predictions for their observables on theoretical side. In this work, we study the contributions of $ρ(770,1450)\to ωπ$ for the cascade decays $B^{+}\to \bar{D}^{(*)0} ρ^+ \to \bar{D}^{(*)0}ωπ^+$, $B^{0}\to D^{(*)-} ρ^+ \to D^{(*)-}ωπ^+$ and $B_s^{0}\to D_s^{(*)-} ρ^+ \to D^{(*)-}ωπ^+$. We introduce $ρ(770,1450)\to ωπ$ subprocesses into the distribution amplitudes for $ωπ$ system via the vector form factor $F_{ωπ}(s)$ and then predict the branching fractions for the first time for concerned quasi-two-body decays with $ρ(770,1450)\to ωπ$, as well as the corresponding longitudinal polarization fractions $Γ_L/Γ$ for the cases with the vector $\bar{D}^{*0}$ or $D_{(s)}^{*-}$ in their final states. The branching fractions of these quasi-two-body decays are predicted at the order of $10^{-3}$, which can be detected at the LHCb and Belle-II experiments. The predictions for the decays ${B}^0 \to{D}^{*-} ρ(770)^+\to {D}^{*-} ωπ^+$ and ${B}^0 \to {D}^{*-} ρ(1450)^+\to {D}^{*-} ωπ^+$ agree well with the measurements from Belle Collaboration. In order to avoid the pollution from annihilation Feynman diagrams, we recommend to take the $B_s^0 \to D_s^{*-}ρ(770,1450)^+$ decays, which have only emission diagrams at quark level, to test the factorization hypothesis for $B$ decays. △ Less

Submitted 12 March, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: 30 pages, 4 figures, the typos in Eq.(2.33) were corrected

Journal ref: JHEP01(2024)047

arXiv:2310.19859 [pdf, other]

Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone

Authors: Zeyinzi Jiang, Chaojie Mao, Ziyuan Huang, Ao Ma, Yiliang Lv, Yujun Shen, Deli Zhao, **gren Zhou

Abstract: Parameter-efficient tuning has become a trend in transferring large-scale foundation models to downstream applications. Existing methods typically embed some light-weight tuners into the backbone, where both the design and the learning of the tuners are highly dependent on the base model. This work offers a new tuning paradigm, dubbed Res-Tuning, which intentionally unbinds tuners from the backbon… ▽ More Parameter-efficient tuning has become a trend in transferring large-scale foundation models to downstream applications. Existing methods typically embed some light-weight tuners into the backbone, where both the design and the learning of the tuners are highly dependent on the base model. This work offers a new tuning paradigm, dubbed Res-Tuning, which intentionally unbinds tuners from the backbone. With both theoretical and empirical evidence, we show that popular tuning approaches have their equivalent counterparts under our unbinding formulation, and hence can be integrated into our framework effortlessly. Thanks to the structural disentanglement, we manage to free the design of tuners from the network architecture, facilitating flexible combination of various tuning strategies. We further propose a memory-efficient variant of Res-Tuning, where the bypass i.e., formed by a sequence of tuners) is effectively detached from the main branch, such that the gradients are back-propagated only to the tuners but not to the backbone. Such a detachment also allows one-time backbone forward for multi-task inference. Extensive experiments on both discriminative and generative tasks demonstrate the superiority of our method over existing alternatives from the perspectives of efficacy and efficiency. Project page: $\href{https://res-tuning.github.io/}{\textit{https://res-tuning.github.io/}}$. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2310.17626 [pdf, ps, other]

A Survey on Transferability of Adversarial Examples across Deep Neural Networks

Authors: **dong Gu, Xiaojun Jia, Pau de Jorge, Wenqain Yu, Xinwei Liu, Avery Ma, Yuan Xun, Anjun Hu, Ashkan Khakzar, Zhijiang Li, Xiaochun Cao, Philip Torr

Abstract: The emergence of Deep Neural Networks (DNNs) has revolutionized various domains by enabling the resolution of complex tasks spanning image recognition, natural language processing, and scientific problem-solving. However, this progress has also brought to light a concerning vulnerability: adversarial examples. These crafted inputs, imperceptible to humans, can manipulate machine learning models in… ▽ More The emergence of Deep Neural Networks (DNNs) has revolutionized various domains by enabling the resolution of complex tasks spanning image recognition, natural language processing, and scientific problem-solving. However, this progress has also brought to light a concerning vulnerability: adversarial examples. These crafted inputs, imperceptible to humans, can manipulate machine learning models into making erroneous predictions, raising concerns for safety-critical applications. An intriguing property of this phenomenon is the transferability of adversarial examples, where perturbations crafted for one model can deceive another, often with a different architecture. This intriguing property enables black-box attacks which circumvents the need for detailed knowledge of the target model. This survey explores the landscape of the adversarial transferability of adversarial examples. We categorize existing methodologies to enhance adversarial transferability and discuss the fundamental principles guiding each approach. While the predominant body of research primarily concentrates on image classification, we also extend our discussion to encompass other vision tasks and beyond. Challenges and opportunities are discussed, highlighting the importance of fortifying DNNs against adversarial vulnerabilities in an evolving landscape. △ Less

Submitted 1 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: Accepted to Transactions on Machine Learning Research (TMLR)

arXiv:2310.14774 [pdf, ps, other]

Principled Approaches for Learning to Defer with Multiple Experts

Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

Abstract: We present a study of surrogate losses and algorithms for the general problem of learning to defer with multiple experts. We first introduce a new family of surrogate losses specifically tailored for the multiple-expert setting, where the prediction and deferral functions are learned simultaneously. We then prove that these surrogate losses benefit from strong $H$-consistency bounds. We illustrate… ▽ More We present a study of surrogate losses and algorithms for the general problem of learning to defer with multiple experts. We first introduce a new family of surrogate losses specifically tailored for the multiple-expert setting, where the prediction and deferral functions are learned simultaneously. We then prove that these surrogate losses benefit from strong $H$-consistency bounds. We illustrate the application of our analysis through several examples of practical surrogate losses, for which we give explicit guarantees. These loss functions readily lead to the design of new learning to defer algorithms based on their minimization. While the main focus of this work is a theoretical analysis, we also report the results of several experiments on SVHN and CIFAR-10 datasets. △ Less

Submitted 31 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: ISAIM 2024

arXiv:2310.14772 [pdf, other]

Predictor-Rejector Multi-Class Abstention: Theoretical Analysis and Algorithms

Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

Abstract: We study the key framework of learning with abstention in the multi-class classification setting. In this setting, the learner can choose to abstain from making a prediction with some pre-defined cost. We present a series of new theoretical and algorithmic results for this learning problem in the predictor-rejector framework. We introduce several new families of surrogate losses for which we prove… ▽ More We study the key framework of learning with abstention in the multi-class classification setting. In this setting, the learner can choose to abstain from making a prediction with some pre-defined cost. We present a series of new theoretical and algorithmic results for this learning problem in the predictor-rejector framework. We introduce several new families of surrogate losses for which we prove strong non-asymptotic and hypothesis set-specific consistency guarantees, thereby resolving positively two existing open questions. These guarantees provide upper bounds on the estimation error of the abstention loss function in terms of that of the surrogate loss. We analyze both a single-stage setting where the predictor and rejector are learned simultaneously and a two-stage setting crucial in applications, where the predictor is learned in a first stage using a standard surrogate loss such as cross-entropy. These guarantees suggest new multi-class abstention algorithms based on minimizing these surrogate losses. We also report the results of extensive experiments comparing these algorithms to the current state-of-the-art algorithms on CIFAR-10, CIFAR-100 and SVHN datasets. Our results demonstrate empirically the benefit of our new surrogate losses and show the remarkable performance of our broadly applicable two-stage abstention algorithm. △ Less

Submitted 31 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: ALT 2024

arXiv:2310.14770 [pdf, ps, other]

Theoretically Grounded Loss Functions and Algorithms for Score-Based Multi-Class Abstention

Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

Abstract: Learning with abstention is a key scenario where the learner can abstain from making a prediction at some cost. In this paper, we analyze the score-based formulation of learning with abstention in the multi-class classification setting. We introduce new families of surrogate losses for the abstention loss function, which include the state-of-the-art surrogate losses in the single-stage setting and… ▽ More Learning with abstention is a key scenario where the learner can abstain from making a prediction at some cost. In this paper, we analyze the score-based formulation of learning with abstention in the multi-class classification setting. We introduce new families of surrogate losses for the abstention loss function, which include the state-of-the-art surrogate losses in the single-stage setting and a novel family of loss functions in the two-stage setting. We prove strong non-asymptotic and hypothesis set-specific consistency guarantees for these surrogate losses, which upper-bound the estimation error of the abstention loss function in terms of the estimation error of the surrogate loss. Our bounds can help compare different score-based surrogates and guide the design of novel abstention algorithms by minimizing the proposed surrogate losses. We experimentally evaluate our new algorithms on CIFAR-10, CIFAR-100, and SVHN datasets and the practical significance of our new surrogate losses and two-stage abstention algorithms. Our results also show that the relative performance of the state-of-the-art score-based surrogate losses can vary across datasets. △ Less

Submitted 31 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: AISTATS 2024

arXiv:2310.10147 [pdf, ps, other]

Block-missing data in linear systems: An unbiased stochastic gradient descent approach

Authors: Chelsea Huynh, Anna Ma, Michael Strand

Abstract: Achieving accurate approximations to solutions of large linear systems is crucial, especially when those systems utilize real-world data. A consequence of using real-world data is that there will inevitably be missingness. Current approaches for dealing with missing data, such as deletion and imputation, can introduce bias. Recent studies proposed an adaptation of stochastic gradient descent (SGD)… ▽ More Achieving accurate approximations to solutions of large linear systems is crucial, especially when those systems utilize real-world data. A consequence of using real-world data is that there will inevitably be missingness. Current approaches for dealing with missing data, such as deletion and imputation, can introduce bias. Recent studies proposed an adaptation of stochastic gradient descent (SGD) in specific missing-data models. In this work, we propose a new algorithm, $\ell$-tuple mSGD, for the setting in which data is missing in a block-wise, tuple pattern. We prove that our proposed method uses unbiased estimates of the gradient of the least squares objective in the presence of tuple missing data. We also draw connections between $\ell$-tuple mSGD and previously established SGD-type methods for missing data. Furthermore, we prove our algorithm converges when using updating step sizes and empirically demonstrate the convergence of $\ell$-tuple mSGD on synthetic data. Lastly, we evaluate $\ell$-tuple mSGD applied to real-world continuous glucose monitoring (CGM) device data. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.06837 [pdf, other]

Generating and Evaluating Tests for K-12 Students with Language Model Simulations: A Case Study on Sentence Reading Efficiency

Authors: Eric Zelikman, Wan**g Anya Ma, Jasmine E. Tran, Diyi Yang, Jason D. Yeatman, Nick Haber

Abstract: Develo** an educational test can be expensive and time-consuming, as each item must be written by experts and then evaluated by collecting hundreds of student responses. Moreover, many tests require multiple distinct sets of questions administered throughout the school year to closely monitor students' progress, known as parallel tests. In this study, we focus on tests of silent sentence reading… ▽ More Develo** an educational test can be expensive and time-consuming, as each item must be written by experts and then evaluated by collecting hundreds of student responses. Moreover, many tests require multiple distinct sets of questions administered throughout the school year to closely monitor students' progress, known as parallel tests. In this study, we focus on tests of silent sentence reading efficiency, used to assess students' reading ability over time. To generate high-quality parallel tests, we propose to fine-tune large language models (LLMs) to simulate how previous students would have responded to unseen items. With these simulated responses, we can estimate each item's difficulty and ambiguity. We first use GPT-4 to generate new test items following a list of expert-developed rules and then apply a fine-tuned LLM to filter the items based on criteria from psychological measurements. We also propose an optimal-transport-inspired technique for generating parallel tests and show the generated tests closely correspond to the original test's difficulty and reliability based on crowdworker responses. Our evaluation of a generated test with 234 students from grades 2 to 8 produces test scores highly correlated (r=0.93) to those of a standard test form written by human experts and evaluated across thousands of K-12 students. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023 (Main)

arXiv:2309.17031 [pdf, other]

Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process

Authors: Zhuo Zheng, Shiqi Tian, Ailong Ma, Liangpei Zhang, Yanfei Zhong

Abstract: Understanding the temporal dynamics of Earth's surface is a mission of multi-temporal remote sensing image analysis, significantly promoted by deep vision models with its fuel -- labeled multi-temporal images. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present a sca… ▽ More Understanding the temporal dynamics of Earth's surface is a mission of multi-temporal remote sensing image analysis, significantly promoted by deep vision models with its fuel -- labeled multi-temporal images. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present a scalable multi-temporal remote sensing change data generator via generative modeling, which is cheap and automatic, alleviating these problems. Our main idea is to simulate a stochastic change process over time. We consider the stochastic change process as a probabilistic semantic state transition, namely generative probabilistic change model (GPCM), which decouples the complex simulation problem into two more trackable sub-problems, \ie, change event simulation and semantic change synthesis. To solve these two problems, we present the change generator (Changen), a GAN-based GPCM, enabling controllable object change data generation, including customizable object property, and change event. The extensive experiments suggest that our Changen has superior generation capability, and the change detectors with Changen pre-training exhibit excellent transferability to real-world change datasets. △ Less

Submitted 29 September, 2023; originally announced September 2023.

Comments: ICCV 2023

arXiv:2309.15309 [pdf, other]

The importance of quality in austere times: University competitiveness and grant income

Authors: Ye Sun, Athen Ma, Georg von Graevenitz, Vito Latora

Abstract: After 2009 many governments implemented austerity measures, often restricting science funding. Did such restrictions further skew grant income towards elite scientists and universities? And did increased competition for funding undermine participation? UK science funding agencies significantly reduced numbers of grants and total grant funding in response to austerity, but surprisingly restrictions… ▽ More After 2009 many governments implemented austerity measures, often restricting science funding. Did such restrictions further skew grant income towards elite scientists and universities? And did increased competition for funding undermine participation? UK science funding agencies significantly reduced numbers of grants and total grant funding in response to austerity, but surprisingly restrictions of science funding were relaxed after the 2015 general election. Exploiting this natural experiment, we show that conventional measures of university competitiveness are poor proxies for competitiveness. An alternative measure of university competitiveness, drawn from complexity science, captures the highly dynamical way in which universities engage in scientific subjects. Building on a data set of 43,430 UK funded grants between 2006 and 2020, we analyse rankings of UK universities and investigate the effect of research competitiveness on grant income. When austerity was relaxed in 2015 the elasticity of grant income w.r.t. research competitiveness fell, reflecting increased effort by researchers at less competitive universities. These scientists increased number and size of grant applications, increasing grant income. The study reveals how funding agencies, facing heterogeneous competitiveness in the population of scientists, affect research effort across the distribution of competitiveness. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.03893 [pdf, other]

DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection

Authors: Manlin Zhang, Jie Wu, Yuxi Ren, Ming Li, Jie Qin, Xuefeng Xiao, Wei Liu, Rui Wang, Min Zheng, Andy J. Ma

Abstract: Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diver… ▽ More Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diversity. To address these issues, we presentDiffusionEngine (DE), a data scaling-up engine that provides high-quality detection-oriented training pairs in a single stage. DE consists of a pre-trained diffusion model and an effective Detection-Adapter, contributing to generating scalable, diverse and generalizable detection data in a plug-and-play manner. Detection-Adapter is learned to align the implicit semantic and location knowledge in off-the-shelf diffusion models with detection-aware signals to make better bounding-box predictions. Additionally, we contribute two datasets, i.e., COCO-DE and VOC-DE, to scale up existing detection benchmarks for facilitating follow-up research. Extensive experiments demonstrate that data scaling-up via DE can achieve significant improvements in diverse scenarios, such as various detection algorithms, self-supervised pre-training, data-sparse, label-scarce, cross-domain, and semi-supervised learning. For example, when using DE with a DINO-based adapter to scale up data, mAP is improved by 3.1% on COCO, 7.6% on VOC, and 11.5% on Clipart. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: Code and Models are publicly available. Project Page: https://mettyz.github.io/DiffusionEngine

arXiv:2308.16904 [pdf, other]

A Note on Randomized Kaczmarz Algorithm for Solving Doubly-Noisy Linear Systems

Authors: El Houcine Bergou, Soumia Boucherouite, Aritra Dutta, Xin Li, Anna Ma

Abstract: Large-scale linear systems, $Ax=b$, frequently arise in practice and demand effective iterative solvers. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz (RK) algorithm has been studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is li… ▽ More Large-scale linear systems, $Ax=b$, frequently arise in practice and demand effective iterative solvers. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz (RK) algorithm has been studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is limited and considers measurement noise in the right-hand side vector, $b$. Unfortunately, in practice, that is not always the case; the coefficient matrix $A$ can also be noisy. In this paper, we analyze the convergence of RK for noisy linear systems when the coefficient matrix, $A$, is corrupted with both additive and multiplicative noise, along with the noisy vector, $b$. In our analyses, the quantity $\tilde R=\| \tilde A^{\dagger} \|_2^2 \|\tilde A \|_F^2$ influences the convergence of RK, where $\tilde A$ represents a noisy version of $A$. We claim that our analysis is robust and realistically applicable, as we do not require information about the noiseless coefficient matrix, $A$, and considering different conditions on noise, we can control the convergence of RK. We substantiate our theoretical findings by performing comprehensive numerical experiments. △ Less

Submitted 31 August, 2023; originally announced August 2023.

MSC Class: 15A06; 15A09; 15A10; 15A18; 65F10; 65Y20; 68Q25; 68W20; 68W40

arXiv:2308.07987 [pdf, other]

On Subsampled Quantile Randomized Kaczmarz

Authors: Jamie Haddock, Anna Ma, Elizaveta Rebrova

Abstract: When solving noisy linear systems Ax = b + c, the theoretical and empirical performance of stochastic iterative methods, such as the Randomized Kaczmarz algorithm, depends on the noise level. However, if there are a small number of highly corrupt measurements, one can instead use quantile-based methods to guarantee convergence to the solution x of the system, despite the presence of noise. Such me… ▽ More When solving noisy linear systems Ax = b + c, the theoretical and empirical performance of stochastic iterative methods, such as the Randomized Kaczmarz algorithm, depends on the noise level. However, if there are a small number of highly corrupt measurements, one can instead use quantile-based methods to guarantee convergence to the solution x of the system, despite the presence of noise. Such methods require the computation of the entire residual vector, which may not be desirable or even feasible in some cases. In this work, we analyze the sub-sampled quantile Randomized Kaczmarz (sQRK) algorithm for solving large-scale linear systems which utilize a sub-sampled residual to approximate the quantile threshold. We prove that this method converges to the unique solution to the linear system and provide numerical experiments that support our theoretical findings. We additionally remark on the extremely small sample size case and demonstrate the importance of interplay between the choice of quantile and subset size. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2308.06703 [pdf, other]

Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods

Authors: Avery Ma, Yangchen Pan, Amir-massoud Farahmand

Abstract: Stochastic gradient descent (SGD) and adaptive gradient methods, such as Adam and RMSProp, have been widely used in training deep neural networks. We empirically show that while the difference between the standard generalization performance of models trained using these methods is small, those trained using SGD exhibit far greater robustness under input perturbations. Notably, our investigation de… ▽ More Stochastic gradient descent (SGD) and adaptive gradient methods, such as Adam and RMSProp, have been widely used in training deep neural networks. We empirically show that while the difference between the standard generalization performance of models trained using these methods is small, those trained using SGD exhibit far greater robustness under input perturbations. Notably, our investigation demonstrates the presence of irrelevant frequencies in natural datasets, where alterations do not affect models' generalization performance. However, models trained with adaptive methods show sensitivity to these changes, suggesting that their use of irrelevant frequencies can lead to solutions sensitive to perturbations. To better understand this difference, we study the learning dynamics of gradient descent (GD) and sign gradient descent (signGD) on a synthetic dataset that mirrors natural signals. With a three-dimensional input space, the models optimized with GD and signGD have standard risks close to zero but vary in their adversarial risks. Our result shows that linear models' robustness to $\ell_2$-norm bounded changes is inversely proportional to the model parameters' weight norm: a smaller weight norm implies better robustness. In the context of deep learning, our experiments show that SGD-trained neural networks have smaller Lipschitz constants, explaining the better robustness to input perturbations than those trained with adaptive gradient methods. △ Less

Submitted 28 November, 2023; v1 submitted 13 August, 2023; originally announced August 2023.

Comments: Accepted at TMLR (Featured Certification). Code: see https://github.com/averyma/opt-robust

arXiv:2307.02035 [pdf, ps, other]

Ranking with Abstention

Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

Abstract: We introduce a novel framework of ranking with abstention, where the learner can abstain from making prediction at some limited cost $c$. We present a extensive theoretical analysis of this framework including a series of $H$-consistency bounds for both the family of linear functions and that of neural networks with one hidden-layer. These theoretical guarantees are the state-of-the-art consistenc… ▽ More We introduce a novel framework of ranking with abstention, where the learner can abstain from making prediction at some limited cost $c$. We present a extensive theoretical analysis of this framework including a series of $H$-consistency bounds for both the family of linear functions and that of neural networks with one hidden-layer. These theoretical guarantees are the state-of-the-art consistency guarantees in the literature, which are upper bounds on the target loss estimation error of a predictor in a hypothesis set $H$, expressed in terms of the surrogate loss estimation error of that predictor. We further argue that our proposed abstention methods are important when using common equicontinuous hypothesis sets in practice. We report the results of experiments illustrating the effectiveness of ranking with abstention. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.08838 [pdf, other]

Differentially Private Domain Adaptation with Theoretical Guarantees

Authors: Raef Bassily, Corinna Cortes, Anqi Mao, Mehryar Mohri

Abstract: In many applications, the labeled data at the learner's disposal is subject to privacy constraints and is relatively limited. To derive a more accurate predictor for the target domain, it is often beneficial to leverage publicly available labeled data from an alternative domain, somewhat close to the target domain. This is the modern problem of supervised domain adaptation from a public source to… ▽ More In many applications, the labeled data at the learner's disposal is subject to privacy constraints and is relatively limited. To derive a more accurate predictor for the target domain, it is often beneficial to leverage publicly available labeled data from an alternative domain, somewhat close to the target domain. This is the modern problem of supervised domain adaptation from a public source to a private target domain. We present two $(ε, δ)$-differentially private adaptation algorithms for supervised adaptation, for which we make use of a general optimization problem, recently shown to benefit from favorable theoretical learning guarantees. Our first algorithm is designed for regression with linear predictors and shown to solve a convex optimization problem. Our second algorithm is a more general solution for loss functions that may be non-convex but Lipschitz and smooth. While our main objective is a theoretical analysis, we also report the results of several experiments first demonstrating that the non-private versions of our algorithms outperform adaptation baselines and next showing that, for larger values of the target sample size or $ε$, the performance of our private algorithms remains close to that of the non-private formulation. △ Less

Submitted 4 February, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.04730 [pdf, other]

Stochastic Natural Thresholding Algorithms

Authors: Rachel Grotheer, Shuang Li, Anna Ma, Deanna Needell, **g Qin

Abstract: Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and disc… ▽ More Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and discusses convergence guarantees for stochastic natural thresholding algorithms by extending the NT from the deterministic version with linear measurements to the stochastic version with a general objective function. We also conduct various numerical experiments on linear and nonlinear measurements to demonstrate the performance of StoNT. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.00357 [pdf, other]

Efficient and Robust Bayesian Selection of Hyperparameters in Dimension Reduction for Visualization

Authors: Yin-Ting Liao, Hengrui Luo, Anna Ma

Abstract: We introduce an efficient and robust auto-tuning framework for hyperparameter selection in dimension reduction (DR) algorithms, focusing on large-scale datasets and arbitrary performance metrics. By leveraging Bayesian optimization (BO) with a surrogate model, our approach enables efficient hyperparameter selection with multi-objective trade-offs and allows us to perform data-driven sensitivity an… ▽ More We introduce an efficient and robust auto-tuning framework for hyperparameter selection in dimension reduction (DR) algorithms, focusing on large-scale datasets and arbitrary performance metrics. By leveraging Bayesian optimization (BO) with a surrogate model, our approach enables efficient hyperparameter selection with multi-objective trade-offs and allows us to perform data-driven sensitivity analysis. By incorporating normalization and subsampling, the proposed framework demonstrates versatility and efficiency, as shown in applications to visualization techniques such as t-SNE and UMAP. We evaluate our results on various synthetic and real-world datasets using multiple quality metrics, providing a robust and efficient solution for hyperparameter selection in DR algorithms. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 20 pages, 16 figures

MSC Class: 62F15; 68T09; 94A16

Showing 1–50 of 231 results for author: Ma, A