Search | arXiv e-print repository

EventLFM: Event Camera integrated Fourier Light Field Microscopy for Ultrafast 3D imaging

Authors: Ruipeng Guo, Qianwan Yang, Andrew S. Chang, Guorong Hu, Joseph Greene, Christopher V. Gabel, Sixian You, Lei Tian

Abstract: Ultrafast 3D imaging is indispensable for visualizing complex and dynamic biological processes. Conventional scanning-based techniques necessitate an inherent trade-off between acquisition speed and space-bandwidth product (SBP). Emerging single-shot 3D wide-field techniques offer a promising alternative but are bottlenecked by the synchronous readout constraints of conventional CMOS systems, thus… ▽ More Ultrafast 3D imaging is indispensable for visualizing complex and dynamic biological processes. Conventional scanning-based techniques necessitate an inherent trade-off between acquisition speed and space-bandwidth product (SBP). Emerging single-shot 3D wide-field techniques offer a promising alternative but are bottlenecked by the synchronous readout constraints of conventional CMOS systems, thus restricting data throughput to maintain high SBP at limited frame rates. To address this, we introduce EventLFM, a straightforward and cost-effective system that overcomes these challenges by integrating an event camera with Fourier light field microscopy (LFM), a state-of-the-art single-shot 3D wide-field imaging technique. The event camera operates on a novel asynchronous readout architecture, thereby bypassing the frame rate limitations inherent to conventional CMOS systems. We further develop a simple and robust event-driven LFM reconstruction algorithm that can reliably reconstruct 3D dynamics from the unique spatiotemporal measurements captured by EventLFM. Experimental results demonstrate that EventLFM can robustly reconstruct fast-moving and rapidly blinking 3D fluorescent samples at kHz frame rates. Furthermore, we highlight EventLFM's capability for imaging of blinking neuronal signals in scattering mouse brain tissues and 3D tracking of GFP-labeled neurons in freely moving C. elegans. We believe that the combined ultrafast speed and large 3D SBP offered by EventLFM may open up new possibilities across many biomedical applications. △ Less

Submitted 3 April, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

arXiv:2309.14324 [pdf, other]

Towards General-Purpose Text-Instruction-Guided Voice Conversion

Authors: Chun-Yi Kuan, Chen An Li, Tsu-Yuan Hsu, Tse-Yang Lin, Ho-Lam Chung, Kai-Wei Chang, Shuo-yiin Chang, Hung-yi Lee

Abstract: This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice". Unlike traditional methods that rely on reference utterances to determine the attributes of the converted speech, our model adds versatility and specificity to voice conversion. The proposed VC model is a neural codec language mo… ▽ More This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice". Unlike traditional methods that rely on reference utterances to determine the attributes of the converted speech, our model adds versatility and specificity to voice conversion. The proposed VC model is a neural codec language model which processes a sequence of discrete codes, resulting in the code sequence of converted speech. It utilizes text instructions as style prompts to modify the prosody and emotional information of the given speech. In contrast to previous approaches, which often rely on employing separate encoders like prosody and content encoders to handle different aspects of the source speech, our model handles various information of speech in an end-to-end manner. Experiments have demonstrated the impressive capabilities of our model in comprehending instructions and delivering reasonable results. △ Less

Submitted 16 January, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: Accepted to ASRU 2023

arXiv:2309.06178 [pdf, other]

doi 10.1038/s41467-024-47186-8

Quantum Simulation of the Bosonic Kitaev Chain

Authors: J. H. Busnaina, Z. Shi, A. McDonald, D. Dubyna, I. Nsanzineza, Jimmy S. C. Hung, C. W. Sandbo Chang, A. A. Clerk, C. M. Wilson

Abstract: Superconducting quantum circuits are a natural platform for quantum simulations of a wide variety of important lattice models describing topological phenomena, spanning condensed matter and high-energy physics. One such model is the bosonic analogue of the well-known fermionic Kitaev chain, a 1D tight-binding model with both nearest-neighbor hop** and pairing terms. Despite being fully Hermitian… ▽ More Superconducting quantum circuits are a natural platform for quantum simulations of a wide variety of important lattice models describing topological phenomena, spanning condensed matter and high-energy physics. One such model is the bosonic analogue of the well-known fermionic Kitaev chain, a 1D tight-binding model with both nearest-neighbor hop** and pairing terms. Despite being fully Hermitian, the bosonic Kitaev chain exhibits a number of striking features associated with non-Hermitian systems, including chiral transport and a dramatic sensitivity to boundary conditions known as the non-Hermitian skin effect. Here, using a multimode superconducting parametric cavity, we implement the bosonic Kitaev chain in synthetic dimensions. The lattice sites are mapped to frequency modes of the cavity, and the $\textit{in situ}$ tunable complex hop** and pairing terms are created by parametric pum** at the mode-difference and mode-sum frequencies, respectively. We experimentally demonstrate important precursors of nontrivial topology and the non-Hermitian skin effect in the bosonic Kitaev chain, including chiral transport, quadrature wavefunction localization, and sensitivity to boundary conditions. Our experiment is an important first step towards exploring genuine many-body non-Hermitian quantum dynamics. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Journal ref: Nature Communications 15, 3065 (2024)

arXiv:2309.04661 [pdf, ps, other]

Evaluating the quantum optimal biased bound in a unitary evolution process

Authors: Shoukang Chang, Wei Ye, Xuan Rao, Huan Zhang, Liqing Huang, Mengmeng Luo, Yuetao Chen, Qiang Ma, Shaoyan Gao

Abstract: Seeking the available precision limit of unknown parameters is a significant task in quantum parameter estimation. One often resorts to the widely utilized quantum Cramer-Rao bound (QCRB) based on unbiased estimators to finish this task. Nevertheless, most actual estimators are usually biased in the limited number of trials. For this reason, we introduce two effective error bounds for biased estim… ▽ More Seeking the available precision limit of unknown parameters is a significant task in quantum parameter estimation. One often resorts to the widely utilized quantum Cramer-Rao bound (QCRB) based on unbiased estimators to finish this task. Nevertheless, most actual estimators are usually biased in the limited number of trials. For this reason, we introduce two effective error bounds for biased estimators based on a unitary evolution process in the framework of the quantum optimal biased bound. Furthermore, we show their estimation performance by two specific examples of the unitary evolution process, including the phase encoding and the SU(2) interferometer process. Our findings will provide an useful guidance for finding the precision limit of unknown parameters. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 11 pages, 3 figures, welcome comments

arXiv:2309.00647 [pdf, other]

Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data

Authors: Seunghan Yang, Byeonggeun Kim, Kyuhong Shim, Simyung Chang

Abstract: Few-shot keyword spotting (FS-KWS) models usually require large-scale annotated datasets to generalize to unseen target keywords. However, existing KWS datasets are limited in scale and gathering keyword-like labeled data is costly undertaking. To mitigate this issue, we propose a framework that uses easily collectible, unlabeled reading speech data as an auxiliary source. Self-supervised learning… ▽ More Few-shot keyword spotting (FS-KWS) models usually require large-scale annotated datasets to generalize to unseen target keywords. However, existing KWS datasets are limited in scale and gathering keyword-like labeled data is costly undertaking. To mitigate this issue, we propose a framework that uses easily collectible, unlabeled reading speech data as an auxiliary source. Self-supervised learning has been widely adopted for learning representations from unlabeled data; however, it is known to be suitable for large models with enough capacity and is not practical for training a small footprint FS-KWS model. Instead, we automatically annotate and filter the data to construct a keyword-like dataset, LibriWord, enabling supervision on auxiliary data. We then adopt multi-task learning that helps the model to enhance the representation power from out-of-domain auxiliary data. Our method notably improves the performance over competitive methods in the FS-KWS benchmark. △ Less

Submitted 31 August, 2023; originally announced September 2023.

Comments: Interspeech 2023

arXiv:2309.00090 [pdf, ps, other]

Benford's Law under Zeckendorf expansion

Authors: Sungkon Chang, Steven J. Miller

Abstract: In the literature, Benford's Law is considered for base-b expansions where b>1 is an integer. In this paper, we investigate the distribution of leading "digits" of a sequence of positive integers under other expansions such as Zeckendorf expansion, and declare what Benford's Law should be under generalized Zeckendorf expansion. In the literature, Benford's Law is considered for base-b expansions where b>1 is an integer. In this paper, we investigate the distribution of leading "digits" of a sequence of positive integers under other expansions such as Zeckendorf expansion, and declare what Benford's Law should be under generalized Zeckendorf expansion. △ Less

Submitted 31 August, 2023; originally announced September 2023.

arXiv:2308.16483 [pdf, other]

Improving Out-of-Distribution Detection in Echocardiographic View Classication through Enhancing Semantic Features

Authors: Jaeik Jeon, Seongmin Ha, Yeonggul Jang, Yeonyee E. Yoon, Jiyeon Kim, Hyunseok Jeong, Dawun Jeong, Youngtaek Hong, Seung-Ah Lee Hyuk-Jae Chang

Abstract: In echocardiographic view classification, accurately detecting out-of-distribution (OOD) data is essential but challenging, especially given the subtle differences between in-distribution and OOD data. While conventional OOD detection methods, such as Mahalanobis distance (MD) are effective in far-OOD scenarios with clear distinctions between distributions, they struggle to discern the less obviou… ▽ More In echocardiographic view classification, accurately detecting out-of-distribution (OOD) data is essential but challenging, especially given the subtle differences between in-distribution and OOD data. While conventional OOD detection methods, such as Mahalanobis distance (MD) are effective in far-OOD scenarios with clear distinctions between distributions, they struggle to discern the less obvious variations characteristic of echocardiographic data. In this study, we introduce a novel use of label smoothing to enhance semantic feature representation in echocardiographic images, demonstrating that these enriched semantic features are key for significantly improving near-OOD instance detection. By combining label smoothing with MD-based OOD detection, we establish a new benchmark for accuracy in echocardiographic OOD detection. △ Less

Submitted 23 November, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.16415 [pdf, other]

Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer

Authors: Kyuhong Shim, **kyu Lee, Simyung Chang, Kyuwoong Hwang

Abstract: Streaming automatic speech recognition (ASR) models are restricted from accessing future context, which results in worse performance compared to the non-streaming models. To improve the performance of streaming ASR, knowledge distillation (KD) from the non-streaming to streaming model has been studied, mainly focusing on aligning the output token probabilities. In this paper, we propose a layer-to… ▽ More Streaming automatic speech recognition (ASR) models are restricted from accessing future context, which results in worse performance compared to the non-streaming models. To improve the performance of streaming ASR, knowledge distillation (KD) from the non-streaming to streaming model has been studied, mainly focusing on aligning the output token probabilities. In this paper, we propose a layer-to-layer KD from the teacher encoder to the student encoder. To ensure that features are extracted using the same context, we insert auxiliary non-streaming branches to the student and perform KD from the non-streaming teacher layer to the non-streaming auxiliary layer. We design a special KD loss that leverages the autoregressive predictive coding (APC) mechanism to encourage the streaming model to predict unseen future contexts. Experimental results show that the proposed method can significantly reduce the word error rate compared to previous token probability distillation methods. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: Accepted to Interspeech 2023

arXiv:2308.15955 [pdf, other]

doi 10.1051/0004-6361/202346879

Circumgalactic Ly$α$ emission around submillimeter-bright galaxies with different quasar contributions

Authors: Vale González Lobos, Fabrizio Arrigoni Battaia, Seok-Jun Chang, Max Gronke, Guinevere Kauffmann, Chian-Chou Chen, Hai Fu, Aura Obreja, Emanuele P. Farina

Abstract: We present VLT/MUSE observations targeting the extended Lyman-$α$ (Ly$α$) emission of five high-redshift ($z\sim$3-4) submillimeter galaxies (SMGs) with increasing quasar (QSO) radiation: two SMGs, two SMGs hosting a QSO, and one SMG hosting a QSO with a SMG companion (QSO+SMG). These sources should be located in dark matter halos of comparable masses (average mass of… ▽ More We present VLT/MUSE observations targeting the extended Lyman-$α$ (Ly$α$) emission of five high-redshift ($z\sim$3-4) submillimeter galaxies (SMGs) with increasing quasar (QSO) radiation: two SMGs, two SMGs hosting a QSO, and one SMG hosting a QSO with a SMG companion (QSO+SMG). These sources should be located in dark matter halos of comparable masses (average mass of $M_{\rm DM}\sim10^{12.2}\,{\rm M}_\odot$). We quantify the luminosity and extent of the Ly$α$ emission, together with its kinematics, and examine four Ly$α$ powering mechanisms: photoionization from QSOs/star formation, shocks by galactic/QSO outflows, gravitational cooling radiation, and Ly$α$ photons resonant scattering. We find a variety of Ly$α$ luminosities and extents, with the QSO+SMG system displaying the most extended and bright nebula, followed by the SMGs hosting a QSO, and finally the undetected circumgalactic medium (CGM) of SMGs. This diversity implies that gravitational cooling is unlikely to be the main powering mechanism. We show that photoionization from the QSO and QSO outflows can contribute to power the emission for average densities $n_{\rm H}>0.5\,$cm$^{-3}$. Moreover, the observed Ly$α$ luminosities scale with the QSO's budget of Ly$α$ photons modulo the dust content in each galaxy, highlighting a possible contribution from resonant scattering of QSO's radiation in powering the nebulae. We find larger Ly$α$ linewidths (FWHM$\gtrsim 1200\,$km$\,$s$^{-1}$) than usually reported around radio-quiet systems, pointing to large-scale outflows. A statistical survey targeting similar high-redshift massive systems with known host properties is needed to confirm our findings. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 17 pages, 9 figures, accepted for publication in ApJ

Journal ref: A&A 679, A41 (2023)

arXiv:2308.14590 [pdf, other]

Twenty-Five Years of Accretion onto the Classical T Tauri Star TW Hya

Authors: Gregory J. Herczeg, Yuguang Chen, Jean-Francois Donati, Andrea K. Dupree, Frederick M. Walter, Lynne A. Hillenbrand, Christopher M. Johns-Krull, Carlo F. Manara, Hans Moritz Guenther, Min Fang, P. Christian Schneider, Jeff A. Valenti, Silvia H. P. Alencar, Laura Venuti, Juan Manuel Alcala, Antonio Frasca, Nicole Arulanantham, Jeffrey L. Linsky, Jerome Bouvier, Nancy S. Brickhouse, Nuria Calvet, Catherine C. Espaillat, Justyn Campbell-White, John M. Carpenter, Seok-Jun Chang , et al. (17 additional authors not shown)

Abstract: Accretion plays a central role in the physics that governs the evolution and dispersal of protoplanetary disks. The primary goal of this paper is to analyze the stability over time of the mass accretion rate onto TW Hya, the nearest accreting solar-mass young star. We measure veiling across the optical spectrum in 1169 archival high-resolution spectra of TW Hya, obtained from 1998--2022. The veili… ▽ More Accretion plays a central role in the physics that governs the evolution and dispersal of protoplanetary disks. The primary goal of this paper is to analyze the stability over time of the mass accretion rate onto TW Hya, the nearest accreting solar-mass young star. We measure veiling across the optical spectrum in 1169 archival high-resolution spectra of TW Hya, obtained from 1998--2022. The veiling is then converted to accretion rate using 26 flux-calibrated spectra that cover the Balmer jump. The accretion rate measured from the excess continuum has an average of $2.51\times10^{-9}$~M$_\odot$~yr$^{-1}$ and a Gaussian distribution with a FWHM of 0.22 dex. This accretion rate may be underestimated by a factor of up to 1.5 because of uncertainty in the bolometric correction and another factor of 1.7 because of excluding the fraction of accretion energy that escapes in lines, especially Ly$α$. The accretion luminosities are well correlated with He line luminosities but poorly correlated with H$α$ and H$β$ luminosity. The accretion rate is always flickering over hours but on longer timescales has been stable over 25 years. This level of variability is consistent with previous measurements for most, but not all, accreting young stars. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: Accepted by ApJ. 31 pages

arXiv:2308.12872 [pdf, other]

Distribution of Zeckendorf expressions

Authors: Sungkon Chang

Abstract: By Zeckendorf's Theorem, every positive integer is uniquely written as a sum of distinct non-adjacent Fibonacci terms. In this paper, we investigate the asymptotic formula of the number of binary expansions $<x$ that have no adjacent terms, and generalize the result to the setting of general linear recurrences with non-negative integer coefficients. By Zeckendorf's Theorem, every positive integer is uniquely written as a sum of distinct non-adjacent Fibonacci terms. In this paper, we investigate the asymptotic formula of the number of binary expansions $<x$ that have no adjacent terms, and generalize the result to the setting of general linear recurrences with non-negative integer coefficients. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.12256 [pdf, other]

doi 10.1145/3604915.3610244

Learning from Negative User Feedback and Measuring Responsiveness for Sequential Recommenders

Authors: Yueqi Wang, Yoni Halpern, Shuo Chang, **gchen Feng, Elaine Ya Le, Longfei Li, Xujian Liang, Min-Cheng Huang, Shane Li, Alex Beutel, Ya** Zhang, Shuchao Bi

Abstract: Sequential recommenders have been widely used in industry due to their strength in modeling user preferences. While these models excel at learning a user's positive interests, less attention has been paid to learning from negative user feedback. Negative user feedback is an important lever of user control, and comes with an expectation that recommenders should respond quickly and reduce similar re… ▽ More Sequential recommenders have been widely used in industry due to their strength in modeling user preferences. While these models excel at learning a user's positive interests, less attention has been paid to learning from negative user feedback. Negative user feedback is an important lever of user control, and comes with an expectation that recommenders should respond quickly and reduce similar recommendations to the user. However, negative feedback signals are often ignored in the training objective of sequential retrieval models, which primarily aim at predicting positive user interactions. In this work, we incorporate explicit and implicit negative user feedback into the training objective of sequential recommenders in the retrieval stage using a "not-to-recommend" loss function that optimizes for the log-likelihood of not recommending items with negative feedback. We demonstrate the effectiveness of this approach using live experiments on a large-scale industrial recommender system. Furthermore, we address a challenge in measuring recommender responsiveness to negative feedback by develo** a counterfactual simulation framework to compare recommender responses between different user actions, showing improved responsiveness from the modeling change. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: RecSys 2023 Industry Track

arXiv:2308.10110 [pdf, other]

Robust Mixture-of-Expert Training for Convolutional Neural Networks

Authors: Yihua Zhang, Ruisi Cai, Tianlong Chen, Guanhua Zhang, Huan Zhang, Pin-Yu Chen, Shiyu Chang, Zhangyang Wang, Sijia Liu

Abstract: Sparsely-gated Mixture of Expert (MoE), an emerging deep model architecture, has demonstrated a great promise to enable high-accuracy and ultra-efficient model inference. Despite the growing popularity of MoE, little work investigated its potential to advance convolutional neural networks (CNNs), especially in the plane of adversarial robustness. Since the lack of robustness has become one of the… ▽ More Sparsely-gated Mixture of Expert (MoE), an emerging deep model architecture, has demonstrated a great promise to enable high-accuracy and ultra-efficient model inference. Despite the growing popularity of MoE, little work investigated its potential to advance convolutional neural networks (CNNs), especially in the plane of adversarial robustness. Since the lack of robustness has become one of the main hurdles for CNNs, in this paper we ask: How to adversarially robustify a CNN-based MoE model? Can we robustly train it like an ordinary CNN model? Our pilot study shows that the conventional adversarial training (AT) mechanism (developed for vanilla CNNs) no longer remains effective to robustify an MoE-CNN. To better understand this phenomenon, we dissect the robustness of an MoE-CNN into two dimensions: Robustness of routers (i.e., gating functions to select data-specific experts) and robustness of experts (i.e., the router-guided pathways defined by the subnetworks of the backbone CNN). Our analyses show that routers and experts are hard to adapt to each other in the vanilla AT. Thus, we propose a new router-expert alternating Adversarial training framework for MoE, termed AdvMoE. The effectiveness of our proposal is justified across 4 commonly-used CNN model architectures over 4 benchmark datasets. We find that AdvMoE achieves 1% ~ 4% adversarial robustness improvement over the original dense CNN, and enjoys the efficiency merit of sparsity-gated MoE, leading to more than 50% inference cost reduction. Codes are available at https://github.com/OPTML-Group/Robust-MoE-CNN. △ Less

Submitted 19 August, 2023; originally announced August 2023.

Comments: ICCV 2023

arXiv:2308.07395 [pdf, other]

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

Authors: Shaan Bijwadia, Shuo-yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath

Abstract: Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is used to supplement paired audio-text data, has shown promising improvements for word error rate. This study examines the use of text injection for auxiliary tasks, which are the non-ASR tasks often performed by an E2E model. In this work, we use joint end-to-end and internal language model training (JEIT) as… ▽ More Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is used to supplement paired audio-text data, has shown promising improvements for word error rate. This study examines the use of text injection for auxiliary tasks, which are the non-ASR tasks often performed by an E2E model. In this work, we use joint end-to-end and internal language model training (JEIT) as our text injection algorithm to train an ASR model which performs two auxiliary tasks. The first is capitalization, which is a de-normalization task. The second is turn-taking prediction, which attempts to identify whether a user has completed their conversation turn in a digital assistant interaction. We show results demonstrating that our text injection method boosts capitalization performance for long-tail data, and improves turn-taking detection recall. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2308.06548 [pdf, other]

Revisiting Vision Transformer from the View of Path Ensemble

Authors: Shuning Chang, Pichao Wang, Hao Luo, Fan Wang, Mike Zheng Shou

Abstract: Vision Transformers (ViTs) are normally regarded as a stack of transformer layers. In this work, we propose a novel view of ViTs showing that they can be seen as ensemble networks containing multiple parallel paths with different lengths. Specifically, we equivalently transform the traditional cascade of multi-head self-attention (MSA) and feed-forward network (FFN) into three parallel paths in ea… ▽ More Vision Transformers (ViTs) are normally regarded as a stack of transformer layers. In this work, we propose a novel view of ViTs showing that they can be seen as ensemble networks containing multiple parallel paths with different lengths. Specifically, we equivalently transform the traditional cascade of multi-head self-attention (MSA) and feed-forward network (FFN) into three parallel paths in each transformer layer. Then, we utilize the identity connection in our new transformer form and further transform the ViT into an explicit multi-path ensemble network. From the new perspective, these paths perform two functions: the first is to provide the feature for the classifier directly, and the second is to provide the lower-level feature representation for subsequent longer paths. We investigate the influence of each path for the final prediction and discover that some paths even pull down the performance. Therefore, we propose the path pruning and EnsembleScale skills for improvement, which cut out the underperforming paths and re-weight the ensemble components, respectively, to optimize the path combination and make the short paths focus on providing high-quality representation for subsequent paths. We also demonstrate that our path combination strategies can help ViTs go deeper and act as high-pass filters to filter out partial low-frequency signals. To further enhance the representation of paths served for subsequent paths, self-distillation is applied to transfer knowledge from the long paths to the short paths. This work calls for more future research to explain and design ViTs from new perspectives. △ Less

Submitted 12 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV 2023, oral presentation

arXiv:2308.06478 [pdf, ps, other]

Tail bounds for Multivariate Random Tensor Means

Authors: Shih-Yu Chang

Abstract: In our recent research endeavors, we have delved into the realm of tail bounds problems concerning bivariate random tensor means. In this context, tensors are treated as finite-dimensional operators. However, the longstanding challenge of extending the concept of operator means to scenarios involving more than two variables had persisted. The primary objective of this present study is to unveil a… ▽ More In our recent research endeavors, we have delved into the realm of tail bounds problems concerning bivariate random tensor means. In this context, tensors are treated as finite-dimensional operators. However, the longstanding challenge of extending the concept of operator means to scenarios involving more than two variables had persisted. The primary objective of this present study is to unveil a collection of tail bounds applicable to multivariate random tensor means. These encompass the weighted arithmetic mean, weighted harmonic mean, and the Karcher mean. These bounds are derived through the utilization of Ando-Hiai's inequalities, alongside tail bounds specifically tailored for multivariate random tensor means employing reverse Ando-Hiai's inequalities, which are rooted in Kantorovich constants. Notably, our methodology involves employing the concept of deformation for operator means with multiple variables, following the principles articulated in Hiai, Seo and Wada's recent work. Additionally, our research contributes to the expansion about the Karcher mean differentiable region from the vicinity of the diagonal identity element within the Cartesian product space of positive definite tensors to the vicinity of the general element within the Cartesian product space of positive definite tensors via the application of the inverse and implicit function theorem. △ Less

Submitted 12 August, 2023; originally announced August 2023.

arXiv:2308.03297 [pdf, ps, other]

Approximate Constrained Discounted Dynamic Programming with Uniform Feasibility and Optimality

Authors: Hyeong Soo Chang

Abstract: We consider a dynamic programming (DP) approach to approximately solving an infinite-horizon constrained Markov decision process (CMDP) problem with a fixed initial-state for the expected total discounted-reward criterion with a uniform-feasibility constraint of the expected total discounted-cost in a deterministic, history-independent, and stationary policy set. We derive a DP-equation that recur… ▽ More We consider a dynamic programming (DP) approach to approximately solving an infinite-horizon constrained Markov decision process (CMDP) problem with a fixed initial-state for the expected total discounted-reward criterion with a uniform-feasibility constraint of the expected total discounted-cost in a deterministic, history-independent, and stationary policy set. We derive a DP-equation that recursively holds for a CMDP problem and its sub-CMDP problems, where each problem, induced from the parameters of the original CMDP problem, admits a uniformly-optimal feasible policy in its policy set associated with the inputs to the problem. A policy constructed from the DP-equation is shown to achieve the optimal values, defined for the CMDP problem the policy is a solution to, at all states. Based on the result, we discuss off-line and on-line computational algorithms, motivated from policy iteration for MDPs, whose output sequences have local convergences for the original CMDP problem. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2308.02226 [pdf, other]

doi 10.1162/tacl_a_00606

Learning to Paraphrase Sentences to Different Complexity Levels

Authors: Alison Chi, Li-Kuang Chen, Yi-Chen Chang, Shu-Hui Lee, Jason S. Chang

Abstract: While sentence simplification is an active research topic in NLP, its adjacent tasks of sentence complexification and same-level paraphrasing are not. To train models on all three tasks, we present two new unsupervised datasets. We compare these datasets, one labeled by a weak classifier and the other by a rule-based approach, with a single supervised dataset. Using these three datasets for traini… ▽ More While sentence simplification is an active research topic in NLP, its adjacent tasks of sentence complexification and same-level paraphrasing are not. To train models on all three tasks, we present two new unsupervised datasets. We compare these datasets, one labeled by a weak classifier and the other by a rule-based approach, with a single supervised dataset. Using these three datasets for training, we perform extensive experiments on both multitasking and prompting strategies. Compared to other systems trained on unsupervised parallel data, models trained on our weak classifier labeled dataset achieve state-of-the-art performance on the ASSET simplification benchmark. Our models also outperform previous work on sentence level targeting. Finally, we establish how a handful of Large Language Models perform on these tasks under a zero-shot setting. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: This arXiv version is a pre-MIT Press publication version, this paper has been accepted by TACL. 22 pages, 3 figures, 13 tables

arXiv:2308.00638 [pdf, other]

A lanthanide-rich kilonova in the aftermath of a long gamma-ray burst

Authors: Yu-Han Yang, Eleonora Troja, Brendan O'Connor, Chris L. Fryer, Myungshin Im, Joe Durbak, Gregory S. H. Paek, Roberto Ricci, Clécio R. De Bom, James H. Gillanders, Alberto J. Castro-Tirado, Zong-Kai Peng, Simone Dichiara, Geoffrey Ryan, Hendrik van Eerten, Zi-Gao Dai, Seo-Won Chang, Hyeonho Choi, Kishalay De, Youdong Hu, Charles D. Kilpatrick, Alexander Kutyrev, Mankeun Jeong, Chung-Uk Lee, Martin Makler , et al. (2 additional authors not shown)

Abstract: Kilonovae are a rare class of astrophysical transients powered by the radioactive decay of nuclei heavier than iron, synthesized in the merger of two compact objects. Over the first few days, the kilonova evolution is dominated by a large number of radioactive isotopes contributing to the heating rate. On timescales of weeks to months, its behavior is predicted to differ depending on the ejecta co… ▽ More Kilonovae are a rare class of astrophysical transients powered by the radioactive decay of nuclei heavier than iron, synthesized in the merger of two compact objects. Over the first few days, the kilonova evolution is dominated by a large number of radioactive isotopes contributing to the heating rate. On timescales of weeks to months, its behavior is predicted to differ depending on the ejecta composition and merger remnant. However, late-time observations of known kilonovae are either missing or limited. Here we report observations of a luminous red transient with a quasi-thermal spectrum, following an unusual gamma-ray burst of long duration. We classify this thermal emission as a kilonova and track its evolution up to two months after the burst. At these late times, the recession of the photospheric radius and the rapidly-decaying bolometric luminosity ($L_{\rm bol}\propto t^{-2.7\pm 0.4}$) support the recombination of lanthanide-rich ejecta as they cool. △ Less

Submitted 2 August, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

Comments: 47 pages, 14 figures, 9 tables; submitted; a minor typo fixed

arXiv:2307.13974 [pdf, other]

Tracking Anything in High Quality

Authors: Jiawen Zhu, Zhenyu Chen, Zeqi Hao, Shijie Chang, Lu Zhang, Dong Wang, Huchuan Lu, Bin Luo, Jun-Yan He, **-Peng Lan, Hanyuan Chen, Chenyang Li

Abstract: Visual object tracking is a fundamental video task in computer vision. Recently, the notably increasing power of perception algorithms allows the unification of single/multiobject and box/mask-based tracking. Among them, the Segment Anything Model (SAM) attracts much attention. In this report, we propose HQTrack, a framework for High Quality Tracking anything in videos. HQTrack mainly consists of… ▽ More Visual object tracking is a fundamental video task in computer vision. Recently, the notably increasing power of perception algorithms allows the unification of single/multiobject and box/mask-based tracking. Among them, the Segment Anything Model (SAM) attracts much attention. In this report, we propose HQTrack, a framework for High Quality Tracking anything in videos. HQTrack mainly consists of a video multi-object segmenter (VMOS) and a mask refiner (MR). Given the object to be tracked in the initial frame of a video, VMOS propagates the object masks to the current frame. The mask results at this stage are not accurate enough since VMOS is trained on several closeset video object segmentation (VOS) datasets, which has limited ability to generalize to complex and corner scenes. To further improve the quality of tracking masks, a pretrained MR model is employed to refine the tracking results. As a compelling testament to the effectiveness of our paradigm, without employing any tricks such as test-time data augmentations and model ensemble, HQTrack ranks the 2nd place in the Visual Object Tracking and Segmentation (VOTS2023) challenge. Code and models are available at https://github.com/jiawen-zhu/HQTrack. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: Technical Report

arXiv:2307.13220 [pdf]

One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

Authors: Zi Wang, Xiaotong Yu, Chengyan Wang, Weibo Chen, Jiazheng Wang, Ying-Hua Chu, Hongwei Sun, Rushuai Li, Peiyong Li, Fan Yang, Haiwei Han, Taishan Kang, Jianzhong Lin, Chen Yang, Shufu Chang, Zhang Shi, Sha Hua, Yan Li, Juan Hu, Liuhong Zhu, Jianjun Zhou, Mei**g Lin, Jiefeng Guo, Congbo Cai, Zhong Chen , et al. (3 additional authors not shown)

Abstract: Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep… ▽ More Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep Learning (DL) has proven effective for fast MRI image reconstruction, its broader applicability across various imaging scenarios has been constrained. Challenges include the high cost and privacy restrictions associated with acquiring large-scale, diverse training data, coupled with the inherent difficulty of addressing mismatches between training and target data in existing DL methodologies. Here, we present a novel Physics-Informed Synthetic data learning framework for Fast MRI, called PISF. PISF marks a breakthrough by enabling generalized DL for multi-scenario MRI reconstruction through a single trained model. Our approach separates the reconstruction of a 2D image into many 1D basic problems, commencing with 1D data synthesis to facilitate generalization. We demonstrate that training DL models on synthetic data, coupled with enhanced learning techniques, yields in vivo MRI reconstructions comparable to or surpassing those of models trained on matched realistic datasets, reducing the reliance on real-world MRI data by up to 96%. Additionally, PISF exhibits remarkable generalizability across multiple vendors and imaging centers. Its adaptability to diverse patient populations has been validated through evaluations by ten experienced medical professionals. PISF presents a feasible and cost-effective way to significantly boost the widespread adoption of DL in various fast MRI applications. △ Less

Submitted 28 February, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

Comments: 38 pages, 19 figures, 5 tables

arXiv:2307.11951 [pdf, other]

A Simple and Efficient RSS-AOA Based Localization with Heterogeneous Anchor Nodes

Authors: Weizhong Ding, Shengming Chang, Shudi Bao

Abstract: Accurate and reliable localization is crucial for various wireless communication applications. Numerous studies have proposed accurate localization methods using hybrid received signal strength (RSS) and angle of arrival (AOA) measurements. However, these studies typically assume identical measurement noise distributions for different anchor nodes, which may not accurately reflect real-world scena… ▽ More Accurate and reliable localization is crucial for various wireless communication applications. Numerous studies have proposed accurate localization methods using hybrid received signal strength (RSS) and angle of arrival (AOA) measurements. However, these studies typically assume identical measurement noise distributions for different anchor nodes, which may not accurately reflect real-world scenarios with varying noise distributions. In this paper, we propose a simple and efficient localization method based on hybrid RSS-AOA measurements that accounts for the varying measurement noises of different nodes. We derive a closed-form estimator for the target location based on the linear weighted least squares (LWLS) algorithm, with each LWLS equation weight being the inverse of its residual variance. Due to the unknown variances of LWLS equation residuals, we employ a two-stage LWLS method for estimation. The proposed method is computationally efficient, adaptable to different types of wireless communication systems and environments, and provides more accurate and reliable localization results compared to existing RSS-AOA localization techniques. Additionally, we derive the Cramer-Rao Lower Bound (CRLB) for the RSS-AOA signal sequences used in the proposed method. Simulation results demonstrate the superiority of the proposed method. △ Less

Submitted 21 July, 2023; originally announced July 2023.

arXiv:2307.11950 [pdf, other]

Accurate RSS-Based Localization Using an Opposition-Based Learning Simulated Annealing Algorithm

Authors: Weizhong Ding, Shengming Chang, Shudi Bao, Meng Chen, Jie Sun

Abstract: Wireless sensor networks require accurate target localization, often achieved through received signal strength (RSS) localization estimation based on maximum likelihood (ML). However, ML-based algorithms can suffer from issues such as low diversity, slow convergence, and local optima, which can significantly affect localization performance. In this paper, we propose a novel localization algorithm… ▽ More Wireless sensor networks require accurate target localization, often achieved through received signal strength (RSS) localization estimation based on maximum likelihood (ML). However, ML-based algorithms can suffer from issues such as low diversity, slow convergence, and local optima, which can significantly affect localization performance. In this paper, we propose a novel localization algorithm that combines opposition-based learning (OBL) and simulated annealing algorithm (SAA) to address these challenges. The algorithm begins by generating an initial solution randomly, which serves as the starting point for the SAA. Subsequently, OBL is employed to generate an opposing initial solution, effectively providing an alternative initial solution. The SAA is then executed independently on both the original and opposing initial solutions, optimizing each towards a potential optimal solution. The final solution is selected as the more effective of the two outcomes from the SAA, thereby reducing the likelihood of the algorithm becoming trapped in local optima. Simulation results indicate that the proposed algorithm consistently outperforms existing algorithms in terms of localization accuracy, demonstrating the effectiveness of our approach. △ Less

Submitted 21 July, 2023; originally announced July 2023.

arXiv:2307.10468 [pdf]

The Greenland Telescope: Construction, Commissioning, and Operations in Pituffik

Authors: Ming-Tang Chen, Keiichi Asada, Satoki Matsushita, Philippe Raffin, Makoto Inoue, Paul T. P. Ho, Chih-Chiang Han, Derek Kubo, Timothy Norton, Nimesh A. Patel, George Nystrom, Chih-Wei L. Huang, Pierre Martin-Cocher, Jun Yi Koay, Cristina Romero-Cañizales, Ching-Tang Liu, Teddy Huang, Kuan-Yu Liu, Tashun Wei, Shu-Hao Chang, Ryan Chilson, Peter Oshiro, Homin Jiang, Chao-Te Li, Geoffrey Bower , et al. (29 additional authors not shown)

Abstract: In 2018, the Greenland Telescope (GLT) started scientific observation in Greenland. Since then, we have completed several significant improvements and added new capabilities to the telescope system. This paper presents a full review of the GLT system, a summary of our observation activities since 2018, the lessons learned from the operations in the Arctic regions, and the prospect of the telescope… ▽ More In 2018, the Greenland Telescope (GLT) started scientific observation in Greenland. Since then, we have completed several significant improvements and added new capabilities to the telescope system. This paper presents a full review of the GLT system, a summary of our observation activities since 2018, the lessons learned from the operations in the Arctic regions, and the prospect of the telescope. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: 26 pages, 11 figures, and 8 tables. This is the version of the article before publication editing, as submitted by an author to Publications of the Astronomical Society of the Pacific. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. The Version of Record will be added when it becomes available

arXiv:2307.09788 [pdf, other]

Density-invariant Features for Distant Point Cloud Registration

Authors: Quan Liu, Hongzi Zhu, Yunsong Zhou, Hongyang Li, Shan Chang, Minyi Guo

Abstract: Registration of distant outdoor LiDAR point clouds is crucial to extending the 3D vision of collaborative autonomous vehicles, and yet is challenging due to small overlap** area and a huge disparity between observed point densities. In this paper, we propose Group-wise Contrastive Learning (GCL) scheme to extract density-invariant geometric features to register distant outdoor LiDAR point clouds… ▽ More Registration of distant outdoor LiDAR point clouds is crucial to extending the 3D vision of collaborative autonomous vehicles, and yet is challenging due to small overlap** area and a huge disparity between observed point densities. In this paper, we propose Group-wise Contrastive Learning (GCL) scheme to extract density-invariant geometric features to register distant outdoor LiDAR point clouds. We mark through theoretical analysis and experiments that, contrastive positives should be independent and identically distributed (i.i.d.), in order to train densityinvariant feature extractors. We propose upon the conclusion a simple yet effective training scheme to force the feature of multiple point clouds in the same spatial location (referred to as positive groups) to be similar, which naturally avoids the sampling bias introduced by a pair of point clouds to conform with the i.i.d. principle. The resulting fully-convolutional feature extractor is more powerful and density-invariant than state-of-the-art methods, improving the registration recall of distant scenarios on KITTI and nuScenes benchmarks by 40.9% and 26.9%, respectively. Code is available at https://github.com/liuQuan98/GCL. △ Less

Submitted 8 August, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

Comments: In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

arXiv:2307.07171 [pdf, other]

Certified Robustness for Large Language Models with Self-Denoising

Authors: Zhen Zhang, Guanhua Zhang, Bairu Hou, Wenqi Fan, Qing Li, Sijia Liu, Yang Zhang, Shiyu Chang

Abstract: Although large language models (LLMs) have achieved great success in vast real-world applications, their vulnerabilities towards noisy inputs have significantly limited their uses, especially in high-stake environments. In these contexts, it is crucial to ensure that every prediction made by large language models is stable, i.e., LLM predictions should be consistent given minor differences in the… ▽ More Although large language models (LLMs) have achieved great success in vast real-world applications, their vulnerabilities towards noisy inputs have significantly limited their uses, especially in high-stake environments. In these contexts, it is crucial to ensure that every prediction made by large language models is stable, i.e., LLM predictions should be consistent given minor differences in the input. This largely falls into the study of certified robust LLMs, i.e., all predictions of LLM are certified to be correct in a local region around the input. Randomized smoothing has demonstrated great potential in certifying the robustness and prediction stability of LLMs. However, randomized smoothing requires adding noise to the input before model prediction, and its certification performance depends largely on the model's performance on corrupted data. As a result, its direct application to LLMs remains challenging and often results in a small certification radius. To address this issue, we take advantage of the multitasking nature of LLMs and propose to denoise the corrupted inputs with LLMs in a self-denoising manner. Different from previous works like denoised smoothing, which requires training a separate model to robustify LLM, our method enjoys far better efficiency and flexibility. Our experiment results show that our method outperforms the existing certification methods under both certified robustness and empirical robustness. The codes are available at https://github.com/UCSB-NLP-Chang/SelfDenoise. △ Less

Submitted 14 July, 2023; originally announced July 2023.

arXiv:2307.07096 [pdf, other]

Low Rank Properties for Estimating Microphones Start Time and Sources Emission Time

Authors: Faxian Cao, Yongqiang Cheng, Adil Mehmood Khan, Zhi**g Yang, S. M. Ahsan Kazmiand Yingxiu Chang

Abstract: Uncertainty in timing information pertaining to the start time of microphone recordings and sources' emission time pose significant challenges in various applications, such as joint microphones and sources localization. Traditional optimization methods, which directly estimate this unknown timing information (UTIm), often fall short compared to approaches exploiting the low-rank property (LRP). LR… ▽ More Uncertainty in timing information pertaining to the start time of microphone recordings and sources' emission time pose significant challenges in various applications, such as joint microphones and sources localization. Traditional optimization methods, which directly estimate this unknown timing information (UTIm), often fall short compared to approaches exploiting the low-rank property (LRP). LRP encompasses an additional low-rank structure, facilitating a linear constraint on UTIm to help formulate related low-rank structure information. This method allows us to attain globally optimal solutions for UTIm, given proper initialization. However, the initialization process often involves randomness, leading to suboptimal, local minimum values. This paper presents a novel, combined low-rank approximation (CLRA) method designed to mitigate the effects of this random initialization. We introduce three new LRP variants, underpinned by mathematical proof, which allow the UTIm to draw on a richer pool of low-rank structural information. Utilizing this augmented low-rank structural information from both LRP and the proposed variants, we formulate four linear constraints on the UTIm. Employing the proposed CLRA algorithm, we derive global optimal solutions for the UTIm via these four linear constraints.Experimental results highlight the superior performance of our method over existing state-of-the-art approaches, measured in terms of both the recovery number and reduced estimation errors of UTIm. △ Less

Submitted 21 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: 13 pages for main content; 9 pages for proof of proposed low rank properties; 13 figures

arXiv:2307.00862 [pdf, other]

UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding

Authors: Rui Sun, Zhecan Wang, Haoxuan You, Noel Codella, Kai-Wei Chang, Shih-Fu Chang

Abstract: Vision-language tasks, such as VQA, SNLI-VE, and VCR are challenging because they require the model's reasoning ability to understand the semantics of the visual world and natural language. Supervised methods working for vision-language tasks have been well-studied. However, solving these tasks in a zero-shot setting is less explored. Since Contrastive Language-Image Pre-training (CLIP) has shown… ▽ More Vision-language tasks, such as VQA, SNLI-VE, and VCR are challenging because they require the model's reasoning ability to understand the semantics of the visual world and natural language. Supervised methods working for vision-language tasks have been well-studied. However, solving these tasks in a zero-shot setting is less explored. Since Contrastive Language-Image Pre-training (CLIP) has shown remarkable zero-shot performance on image-text matching, previous works utilized its strong zero-shot ability by converting vision-language tasks into an image-text matching problem, and they mainly consider global-level matching (e.g., the whole image or sentence). However, we find visual and textual fine-grained information, e.g., keywords in the sentence and objects in the image, can be fairly informative for semantics understanding. Inspired by this, we propose a unified framework to take advantage of the fine-grained information for zero-shot vision-language learning, covering multiple tasks such as VQA, SNLI-VE, and VCR. Our experiments show that our framework outperforms former zero-shot methods on VQA and achieves substantial improvement on SNLI-VE and VCR. Furthermore, our ablation studies confirm the effectiveness and generalizability of our proposed method. Code will be available at https://github.com/ThreeSR/UniFine △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: 14 pages, 4 figures, ACL 2023 Findings

arXiv:2306.16526 [pdf, other]

Shilling Black-box Review-based Recommender Systems through Fake Review Generation

Authors: Hung-Yun Chiang, Yi-Syuan Chen, Yun-Zhu Song, Hong-Han Shuai, Jason S. Chang

Abstract: Review-Based Recommender Systems (RBRS) have attracted increasing research interest due to their ability to alleviate well-known cold-start problems. RBRS utilizes reviews to construct the user and items representations. However, in this paper, we argue that such a reliance on reviews may instead expose systems to the risk of being shilled. To explore this possibility, in this paper, we propose th… ▽ More Review-Based Recommender Systems (RBRS) have attracted increasing research interest due to their ability to alleviate well-known cold-start problems. RBRS utilizes reviews to construct the user and items representations. However, in this paper, we argue that such a reliance on reviews may instead expose systems to the risk of being shilled. To explore this possibility, in this paper, we propose the first generation-based model for shilling attacks against RBRSs. Specifically, we learn a fake review generator through reinforcement learning, which maliciously promotes items by forcing prediction shifts after adding generated reviews to the system. By introducing the auxiliary rewards to increase text fluency and diversity with the aid of pre-trained language models and aspect predictors, the generated reviews can be effective for shilling with high fidelity. Experimental results demonstrate that the proposed framework can successfully attack three different kinds of RBRSs on the Amazon corpus with three domains and Yelp corpus. Furthermore, human studies also show that the generated reviews are fluent and informative. Finally, equipped with Attack Review Generators (ARGs), RBRSs with adversarial training are much more robust to malicious reviews. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.14667 [pdf, other]

Measuring unequal distribution of pandemic severity across census years, variants of concern and interventions

Authors: Quang Dang Nguyen, Sheryl L. Chang, Christina M. Jamerlan, Mikhail Prokopenko

Abstract: Diverse and complex intervention policies deployed over the last years have shown varied effectiveness in controlling the COVID-19 pandemic. However, a systematic analysis and modelling of the combined effects of different viral lineages and complex intervention policies remains a challenge. Using large-scale agent-based modelling and a high-resolution computational simulation matching census-base… ▽ More Diverse and complex intervention policies deployed over the last years have shown varied effectiveness in controlling the COVID-19 pandemic. However, a systematic analysis and modelling of the combined effects of different viral lineages and complex intervention policies remains a challenge. Using large-scale agent-based modelling and a high-resolution computational simulation matching census-based demographics of Australia, we carried out a systematic comparative analysis of several COVID-19 pandemic scenarios. The scenarios covered two most recent Australian census years (2016 and 2021), three variants of concern (ancestral, Delta and Omicron), and five representative intervention policies. In addition, we introduced pandemic Lorenz curves measuring an unequal distribution of the pandemic severity across local areas. We quantified nonlinear effects of population heterogeneity on the pandemic severity, highlighting that (i) the population growth amplifies pandemic peaks, (ii) the changes in population size amplify the peak incidence more than the changes in density, and (iii) the pandemic severity is distributed unequally across local areas. We also examined and delineated the effects of urbanisation on the incidence bimodality, distinguishing between urban and regional pandemic waves. Finally, we quantified and examined the impact of school closures, complemented by partial interventions, and identified the conditions when inclusion of school closures may decisively control the transmission. Our results suggest that (a) public health response to long-lasting pandemics must be frequently reviewed and adapted to demographic changes, (b) in order to control recurrent waves, mass-vaccination rollouts need to be complemented by partial NPIs, and (c) healthcare and vaccination resources need to be prioritised towards the localities and regions with high population growth and/or high density. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: 43 pages, 25 figures, source code: https://zenodo.org/record/5778218

arXiv:2306.13145 [pdf, other]

Neoclassical transport of tungsten ion bundles in total-f neoclassical gyrokinetic simulations of a whole-volume JET-like plasma

Authors: J. Dominski, C. S. Chang, R. Hager, S. Ku, E. S. Yoon, V. Parail

Abstract: The application of a bundling technique to model the diverse charge states of tungsten impurity species in total-f gyrokinetic simulations is demonstrated. The gyrokinetic bundling method strategically groups tungsten ions of similar charge, optimizing computational efficiency. The initial radial configuration of these bundles and their respective charges are derived from a coronal approximation a… ▽ More The application of a bundling technique to model the diverse charge states of tungsten impurity species in total-f gyrokinetic simulations is demonstrated. The gyrokinetic bundling method strategically groups tungsten ions of similar charge, optimizing computational efficiency. The initial radial configuration of these bundles and their respective charges are derived from a coronal approximation and the quasi-neutrality of the plasma. A low-density JET H-mode like plasma is simulated using the neoclassical version of XGC across the entire plasma volume, spanning from the magnetic axis to the divertor. An accumulation of tungsten is observed at the pedestal top, as a result of low-Z tungsten ions moving inward from the scrape-off-layer (SOL) into the core region and high-Z tungsten ions moving outward from the core into the pedestal. This organization of the fluxes cannot be captured by a single tungsten-ion simulation. Large up-down poloidal asymmetries of tungsten form in the pedestal and strongly influence the direction of neoclassical fluxes. The temperature screening effect and its correlation with asymmetries is analyzed. △ Less

Submitted 25 January, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: 11 pages, 11 figures

arXiv:2306.07457 [pdf, other]

Accurate Measures of Vaccination and Concerns of Vaccine Holdouts from Web Search Logs

Authors: Serina Chang, Adam Fourney, Eric Horvitz

Abstract: To design effective vaccine policies, policymakers need detailed data about who has been vaccinated, who is holding out, and why. However, existing data in the US are insufficient: reported vaccination rates are often delayed or missing, and surveys of vaccine hesitancy are limited by high-level questions and self-report biases. Here, we show how large-scale search engine logs and machine learning… ▽ More To design effective vaccine policies, policymakers need detailed data about who has been vaccinated, who is holding out, and why. However, existing data in the US are insufficient: reported vaccination rates are often delayed or missing, and surveys of vaccine hesitancy are limited by high-level questions and self-report biases. Here, we show how large-scale search engine logs and machine learning can be leveraged to fill these gaps and provide novel insights about vaccine intentions and behaviors. First, we develop a vaccine intent classifier that can accurately detect when a user is seeking the COVID-19 vaccine on search. Our classifier demonstrates strong agreement with CDC vaccination rates, with correlations above 0.86, and estimates vaccine intent rates to the level of ZIP codes in real time, allowing us to pinpoint more granular trends in vaccine seeking across regions, demographics, and time. To investigate vaccine hesitancy, we use our classifier to identify two groups, vaccine early adopters and vaccine holdouts. We find that holdouts, compared to early adopters matched on covariates, are 69% more likely to click on untrusted news sites. Furthermore, we organize 25,000 vaccine-related URLs into a hierarchical ontology of vaccine concerns, and we find that holdouts are far more concerned about vaccine requirements, vaccine development and approval, and vaccine myths, and even within holdouts, concerns vary significantly across demographic groups. Finally, we explore the temporal dynamics of vaccine concerns and vaccine seeking, and find that key indicators emerge when individuals convert from holding out to preparing to accept the vaccine. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2306.03683 [pdf, ps, other]

Legendrian mean curvature flow in $η$-Einstein Sasakian manifolds

Authors: Shu-Cheng Chang, Yingbo Han, Chin-Tung Wu

Abstract: Recently, there are a great deal of work done which connects the Legendrian isotopic problem with contact invariants. The isotopic problem of Legendre curve in a contact 3-manifold was studies via the Legendrian curve shortening flow which was introduced and studied by K. Smoczyk. On the other hand, in the SYZ Conjecture, one can model a special Lagrangian singularity locally as the special Lagran… ▽ More Recently, there are a great deal of work done which connects the Legendrian isotopic problem with contact invariants. The isotopic problem of Legendre curve in a contact 3-manifold was studies via the Legendrian curve shortening flow which was introduced and studied by K. Smoczyk. On the other hand, in the SYZ Conjecture, one can model a special Lagrangian singularity locally as the special Lagrangian cones in C^{3}. This can be characterized by its link which is a minimal Legendrian surface in the 5-sphere. Then in these points of view, in this paper we will focus on the existence of the long-time solution and asymptotic convergence along the Legendrian mean curvature flow in higher dimensional η-Einstein Sasakian (2n+1)-manifolds under the suitable stability condition due to the Thomas-Yau conjecture. △ Less

Submitted 19 April, 2023; originally announced June 2023.

Comments: arXiv admin note: text overlap with arXiv:0906.5527 by other authors

MSC Class: 53C44 (Primary); 53C56 (Secondary)

arXiv:2306.02291 [pdf, other]

3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic Segmentation on VSPW

Authors: Shijie Chang, Zeqi Hao, Ben Kang, Xiaoqi Zhao, Jiawen Zhu, Zhenyu Chen, Lihe Zhang, Lu Zhang, Huchuan Lu

Abstract: In this paper, we introduce 3rd place solution for PVUW2023 VSS track. Semantic segmentation is a fundamental task in computer vision with numerous real-world applications. We have explored various image-level visual backbones and segmentation heads to tackle the problem of video semantic segmentation. Through our experimentation, we find that InternImage-H as the backbone and Mask2former as the s… ▽ More In this paper, we introduce 3rd place solution for PVUW2023 VSS track. Semantic segmentation is a fundamental task in computer vision with numerous real-world applications. We have explored various image-level visual backbones and segmentation heads to tackle the problem of video semantic segmentation. Through our experimentation, we find that InternImage-H as the backbone and Mask2former as the segmentation head achieves the best performance. In addition, we explore two post-precessing methods: CascadePSP and Segment Anything Model (SAM). Ultimately, our approach obtains 62.60\% and 64.84\% mIoU on the VSPW test set1 and final test set, respectively, securing the third position in the PVUW2023 VSS track. △ Less

Submitted 5 June, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

Comments: 3rd Place Solution for CVPR 2023 PVUW VSS Track

arXiv:2306.01015 [pdf, other]

doi 10.21437/Interspeech.2023-1079

How to Estimate Model Transferability of Pre-Trained Speech Models?

Authors: Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath

Abstract: In this work, we introduce a "score-based assessment" framework for estimating the transferability of pre-trained speech models (PSMs) for fine-tuning target tasks. We leverage upon two representation theories, Bayesian likelihood estimation and optimal transport, to generate rank scores for the PSM candidates using the extracted representations. Our framework efficiently computes transferability… ▽ More In this work, we introduce a "score-based assessment" framework for estimating the transferability of pre-trained speech models (PSMs) for fine-tuning target tasks. We leverage upon two representation theories, Bayesian likelihood estimation and optimal transport, to generate rank scores for the PSM candidates using the extracted representations. Our framework efficiently computes transferability scores without actual fine-tuning of candidate models or layers by making a temporal independent hypothesis. We evaluate some popular supervised speech models (e.g., Conformer RNN-Transducer) and self-supervised speech models (e.g., HuBERT) in cross-layer and cross-model settings using public data. Experimental results show a high Spearman's rank correlation and low $p$-value between our estimation framework and fine-tuning ground truth. Our proposed transferability framework requires less computational time and resources, making it a resource-saving and time-efficient approach for tuning speech foundation models. △ Less

Submitted 5 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Accepted to Interspeech. Code is available at: https://github.com/virginiakm1988/LogME-CTC. Fixed a typo

arXiv:2305.18641 [pdf, other]

Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs

Authors: Mingyang Zhou, Yi R. Fung, Long Chen, Christopher Thomas, Heng Ji, Shih-Fu Chang

Abstract: Building cross-model intelligence that can understand charts and communicate the salient information hidden behind them is an appealing challenge in the vision and language(V+L) community. The capability to uncover the underlined table data of chart figures is a critical key to automatic chart understanding. We introduce ChartT5, a V+L model that learns how to interpret table information from char… ▽ More Building cross-model intelligence that can understand charts and communicate the salient information hidden behind them is an appealing challenge in the vision and language(V+L) community. The capability to uncover the underlined table data of chart figures is a critical key to automatic chart understanding. We introduce ChartT5, a V+L model that learns how to interpret table information from chart images via cross-modal pre-training on plot table pairs. Specifically, we propose two novel pre-training objectives: Masked Header Prediction (MHP) and Masked Value Prediction (MVP) to facilitate the model with different skills to interpret the table information. We have conducted extensive experiments on chart question answering and chart summarization to verify the effectiveness of the proposed pre-training strategies. In particular, on the ChartQA benchmark, our ChartT5 outperforms the state-of-the-art non-pretraining methods by over 8% performance gains. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: Accepted by Findings of ACL 2023

arXiv:2305.18419 [pdf, other]

Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR

Authors: W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-yiin Chang, Tara N. Sainath

Abstract: We propose a method of segmenting long-form speech by separating semantically complete sentences within the utterance. This prevents the ASR decoder from needlessly processing faraway context while also preventing it from missing relevant context within the current sentence. Semantically complete sentence boundaries are typically demarcated by punctuation in written text; but unfortunately, spoken… ▽ More We propose a method of segmenting long-form speech by separating semantically complete sentences within the utterance. This prevents the ASR decoder from needlessly processing faraway context while also preventing it from missing relevant context within the current sentence. Semantically complete sentence boundaries are typically demarcated by punctuation in written text; but unfortunately, spoken real-world utterances rarely contain punctuation. We address this limitation by distilling punctuation knowledge from a bidirectional teacher language model (LM) trained on written, punctuated text. We compare our segmenter, which is distilled from the LM teacher, against a segmenter distilled from a acoustic-pause-based teacher used in other works, on a streaming ASR pipeline. The pipeline with our segmenter achieves a 3.2% relative WER gain along with a 60 ms median end-of-segment latency reduction on a YouTube captioning task. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: Interspeech 2023. First 3 authors contributed equally

arXiv:2305.18292 [pdf, other]

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

Authors: Yuchao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, Wuyou Xiao, Rui Zhao, Shuning Chang, Weijia Wu, Yixiao Ge, Ying Shan, Mike Zheng Shou

Abstract: Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained significant attention from the community. These models can be easily customized for new concepts using low-rank adaptations (LoRAs). However, the utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We refer to this scenario as decentralized multi-concept… ▽ More Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained significant attention from the community. These models can be easily customized for new concepts using low-rank adaptations (LoRAs). However, the utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We refer to this scenario as decentralized multi-concept customization, which involves single-client concept tuning and center-node concept fusion. In this paper, we propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization, including concept conflicts resulting from existing single-client LoRA tuning and identity loss during model fusion. Mix-of-Show adopts an embedding-decomposed LoRA (ED-LoRA) for single-client tuning and gradient fusion for the center node to preserve the in-domain essence of single concepts and support theoretically limitless concept fusion. Additionally, we introduce regionally controllable sampling, which extends spatially controllable sampling (e.g., ControlNet and T2I-Adaptor) to address attribute binding and missing object problems in multi-concept sampling. Extensive experiments demonstrate that Mix-of-Show is capable of composing multiple customized concepts with high fidelity, including characters, objects, and scenes. △ Less

Submitted 9 November, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

arXiv:2305.17542 [pdf, other]

Non-Sequential Graph Script Induction via Multimedia Grounding

Authors: Yu Zhou, Sha Li, Manling Li, Xudong Lin, Shih-Fu Chang, Mohit Bansal, Heng Ji

Abstract: Online resources such as WikiHow compile a wide range of scripts for performing everyday tasks, which can assist models in learning to reason about procedures. However, the scripts are always presented in a linear manner, which does not reflect the flexibility displayed by people executing tasks in real life. For example, in the CrossTask Dataset, 64.5% of consecutive step pairs are also observed… ▽ More Online resources such as WikiHow compile a wide range of scripts for performing everyday tasks, which can assist models in learning to reason about procedures. However, the scripts are always presented in a linear manner, which does not reflect the flexibility displayed by people executing tasks in real life. For example, in the CrossTask Dataset, 64.5% of consecutive step pairs are also observed in the reverse order, suggesting their ordering is not fixed. In addition, each step has an average of 2.56 frequent next steps, demonstrating "branching". In this paper, we propose the new challenging task of non-sequential graph script induction, aiming to capture optional and interchangeable steps in procedural planning. To automate the induction of such graph scripts for given tasks, we propose to take advantage of loosely aligned videos of people performing the tasks. In particular, we design a multimodal framework to ground procedural videos to WikiHow textual steps and thus transform each video into an observed step path on the latent ground truth graph script. This key transformation enables us to train a script knowledge model capable of both generating explicit graph scripts for learnt tasks and predicting future steps given a partial step sequence. Our best model outperforms the strongest pure text/vision baselines by 17.52% absolute gains on F1@3 for next step prediction and 13.8% absolute gains on Acc@1 for partial sequence completion. Human evaluation shows our model outperforming the WikiHow linear baseline by 48.76% absolute gains in capturing sequential and non-sequential step relationships. △ Less

Submitted 27 May, 2023; originally announced May 2023.

arXiv:2305.17540 [pdf, other]

Learning from Children: Improving Image-Caption Pretraining via Curriculum

Authors: Hammad A. Ayyubi, Rahul Lokesh, Alireza Zareian, Bo Wu, Shih-Fu Chang

Abstract: Image-caption pretraining has been quite successfully used for downstream vision tasks like zero-shot image classification and object detection. However, image-caption pretraining is still a hard problem -- it requires multiple concepts (nouns) from captions to be aligned to several objects in images. To tackle this problem, we go to the roots -- the best learner, children. We take inspiration fro… ▽ More Image-caption pretraining has been quite successfully used for downstream vision tasks like zero-shot image classification and object detection. However, image-caption pretraining is still a hard problem -- it requires multiple concepts (nouns) from captions to be aligned to several objects in images. To tackle this problem, we go to the roots -- the best learner, children. We take inspiration from cognitive science studies dealing with children's language learning to propose a curriculum learning framework. The learning begins with easy-to-align image caption pairs containing one concept per caption. The difficulty is progressively increased with each new phase by adding one more concept per caption. Correspondingly, the knowledge acquired in each learning phase is utilized in subsequent phases to effectively constrain the learning problem to aligning one new concept-object pair in each phase. We show that this learning strategy improves over vanilla image-caption training in various settings -- pretraining from scratch, using a pretrained image or/and pretrained text encoder, low data regime etc. △ Less

Submitted 30 May, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: ACL Findings 2023

arXiv:2305.17304 [pdf, other]

External Language Model Integration for Factorized Neural Transducers

Authors: Michael Levit, Sarangarajan Parthasarathy, Cem Aksoylar, Mohammad Sadegh Rasooli, Shuangyu Chang

Abstract: We propose an adaptation method for factorized neural transducers (FNT) with external language models. We demonstrate that both neural and n-gram external LMs add significantly more value when linearly interpolated with predictor output compared to shallow fusion, thus confirming that FNT forces the predictor to act like regular language models. Further, we propose a method to integrate class-base… ▽ More We propose an adaptation method for factorized neural transducers (FNT) with external language models. We demonstrate that both neural and n-gram external LMs add significantly more value when linearly interpolated with predictor output compared to shallow fusion, thus confirming that FNT forces the predictor to act like regular language models. Further, we propose a method to integrate class-based n-gram language models into FNT framework resulting in accuracy gains similar to a hybrid setup. We show average gains of 18% WERR with lexical adaptation across various scenarios and additive gains of up to 60% WERR in one entity-rich scenario through a combination of class-based n-gram and neural LMs. △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.14985 [pdf, other]

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

Authors: Haoxuan You, Rui Sun, Zhecan Wang, Long Chen, Gengyu Wang, Hammad A. Ayyubi, Kai-Wei Chang, Shih-Fu Chang

Abstract: The field of vision-and-language (VL) understanding has made unprecedented progress with end-to-end large pre-trained VL models (VLMs). However, they still fall short in zero-shot reasoning tasks that require multi-step inferencing. To achieve this goal, previous works resort to a divide-and-conquer pipeline. In this paper, we argue that previous efforts have several inherent shortcomings: 1) They… ▽ More The field of vision-and-language (VL) understanding has made unprecedented progress with end-to-end large pre-trained VL models (VLMs). However, they still fall short in zero-shot reasoning tasks that require multi-step inferencing. To achieve this goal, previous works resort to a divide-and-conquer pipeline. In this paper, we argue that previous efforts have several inherent shortcomings: 1) They rely on domain-specific sub-question decomposing models. 2) They force models to predict the final answer even if the sub-questions or sub-answers provide insufficient information. We address these limitations via IdealGPT, a framework that iteratively decomposes VL reasoning using large language models (LLMs). Specifically, IdealGPT utilizes an LLM to generate sub-questions, a VLM to provide corresponding sub-answers, and another LLM to reason to achieve the final answer. These three modules perform the divide-and-conquer procedure iteratively until the model is confident about the final answer to the main question. We evaluate IdealGPT on multiple challenging VL reasoning tasks under a zero-shot setting. In particular, our IdealGPT outperforms the best existing GPT-4-like models by an absolute 10% on VCR and 15% on SNLI-VE. Code is available at https://github.com/Hxyou/IdealGPT △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: 13 pages, 5 figures

arXiv:2305.11853 [pdf, other]

How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings

Authors: Shuaichen Chang, Eric Fosler-Lussier

Abstract: Large language models (LLMs) with in-context learning have demonstrated remarkable capability in the text-to-SQL task. Previous research has prompted LLMs with various demonstration-retrieval strategies and intermediate reasoning steps to enhance the performance of LLMs. However, those works often employ varied strategies when constructing the prompt text for text-to-SQL inputs, such as databases… ▽ More Large language models (LLMs) with in-context learning have demonstrated remarkable capability in the text-to-SQL task. Previous research has prompted LLMs with various demonstration-retrieval strategies and intermediate reasoning steps to enhance the performance of LLMs. However, those works often employ varied strategies when constructing the prompt text for text-to-SQL inputs, such as databases and demonstration examples. This leads to a lack of comparability in both the prompt constructions and their primary contributions. Furthermore, selecting an effective prompt construction has emerged as a persistent problem for future research. To address this limitation, we comprehensively investigate the impact of prompt constructions across various settings and provide insights into prompt constructions for future text-to-SQL studies. △ Less

Submitted 26 November, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Journal ref: NeurIPS 2023 Table Representation Learning Workshop

arXiv:2305.10247 [pdf, other]

Can Deep Network Balance Copy-Move Forgery Detection and Distinguishment?

Authors: Shizhen Chang

Abstract: Copy-move forgery detection is a crucial research area within digital image forensics, as it focuses on identifying instances where objects in an image are duplicated and placed in different locations. The detection of such forgeries is particularly important in contexts where they can be exploited for malicious purposes. Recent years have witnessed an increased interest in distinguishing between… ▽ More Copy-move forgery detection is a crucial research area within digital image forensics, as it focuses on identifying instances where objects in an image are duplicated and placed in different locations. The detection of such forgeries is particularly important in contexts where they can be exploited for malicious purposes. Recent years have witnessed an increased interest in distinguishing between the original and duplicated objects in copy-move forgeries, accompanied by the development of larger-scale datasets to facilitate this task. However, existing approaches to copy-move forgery detection and source/target differentiation often involve two separate steps or the design of individual end-to-end networks for each task. In this paper, we propose an innovative method that employs the transformer architecture in an end-to-end deep neural network. Our method aims to detect instances of copy-move forgery while simultaneously localizing the source and target regions. By utilizing this approach, we address the challenges posed by multi-object copy-move scenarios and report if there is a balance between the detection and differentiation tasks. To evaluate the performance of our proposed network, we conducted experiments on two publicly available copy-move datasets. The results and analysis aims to show the potential significance of our focus in balancing detection and distinguishment result and transferring the trained model in different datasets in the field. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.08887 [pdf]

Covariate-distance Weighted Regression (CWR): A Case Study for Estimation of House Prices

Authors: Hone-Jay Chu, Po-Hung Chen, Sheng-Mao Chang, Muhammad Zeeshan Ali, Sumriti Ranjan Patra

Abstract: Geographically weighted regression (GWR) is a popular tool for modeling spatial heterogeneity in a regression model. However, the current weighting function used in GWR only considers the geographical distance, while the attribute similarity is totally ignored. In this study, we proposed a covariate weighting function that combines the geographical distance and attribute distance. The covariate-di… ▽ More Geographically weighted regression (GWR) is a popular tool for modeling spatial heterogeneity in a regression model. However, the current weighting function used in GWR only considers the geographical distance, while the attribute similarity is totally ignored. In this study, we proposed a covariate weighting function that combines the geographical distance and attribute distance. The covariate-distance weighted regression (CWR) is the extension of GWR including geographical distance and attribute distance. House prices are affected by numerous factors, such as house age, floor area, and land use. Prediction model is used to help understand the characteristics of regional house prices. The CWR was used to understand the relationship between the house price and controlling factors. The CWR can consider the geological and attribute distances, and produce accurate estimates of house price that preserve the weight matrix for geological and attribute distance functions. Results show that the house attributes/conditions and the characteristics of the house, such as floor area and house age, might affect the house price. After factor selection, in which only house age and floor area of a building are considered, the RMSE of the CWR model can be improved by 2.9%-26.3% for skyscrapers when compared to the GWR. CWR can effectively reduce estimation errors from traditional spatial regression models and provide novel and feasible models for spatial estimation. △ Less

Submitted 14 May, 2023; originally announced May 2023.

arXiv:2305.08397 [pdf, ps, other]

Global quantum thermometry based on the optimal biased bound

Authors: Shoukang Chang, Wei Ye, Xuan Rao, Huan Zhang, Liqing Huang, Mengmeng Luo, Yuetao Chen, Qiang Ma, Shaoyan Gao

Abstract: Thermometry is a fundamental parameter estimation problem which is crucial in the development process of natural sciences. One way to solve this problem is to the extensive used local thermometry theory, which makes use of the classical and quantum Cramér-Rao bound as benchmarks of thermometry precision. However, such a thermometry theory can only be used for decreasing temperature fluctuations ar… ▽ More Thermometry is a fundamental parameter estimation problem which is crucial in the development process of natural sciences. One way to solve this problem is to the extensive used local thermometry theory, which makes use of the classical and quantum Cramér-Rao bound as benchmarks of thermometry precision. However, such a thermometry theory can only be used for decreasing temperature fluctuations around a known temperature value and hardly tackle the precision thermometry problem over a wide temperature range. For this reason, we derive two basic bounds on thermometry precision in the global setting and further show their thermometry performance by two specific applications, i.e., noninteracting spin-1/2 gas and a general N-level thermal equilibrium quantum probe. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: 7 pages, 2 figures

arXiv:2305.03305 [pdf, ps, other]

Random Tensor Inequalities and Tail bounds for Bivariate Random Tensor Means, Part II

Authors: Shih-Yu Chang

Abstract: This is Part II of our work about random tensor inequalities and tail bounds for bivariate random tensor means. After reviewing basic facts about random tensors, we first consider tail bounds with more general connection functions. Then, a general Lie-Trotter formula for tensors is derived and this formula is applied to establish tail bounds for bivariate random tensor means involving tensor logar… ▽ More This is Part II of our work about random tensor inequalities and tail bounds for bivariate random tensor means. After reviewing basic facts about random tensors, we first consider tail bounds with more general connection functions. Then, a general Lie-Trotter formula for tensors is derived and this formula is applied to establish tail bounds for bivariate random tensor means involving tensor logarithm. All random tensors studied in our Part I work are assumed as positive definite (PD) random tensors, which are invertible tensors. In this Part II work, we generalize our tail bounds for bivariate random tensor means from positive definite (PD) random tensors to positive semidefinite (PSD) random tensors by defining Random Tensor Topology (RTT) and develo** the limitation method based on RTT. Finally, we apply our theory to establish tail bounds and Löwner ordering relationships for bivariate random tensor means before and after two tensor data processing methods: data fusion and linear transform. % △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2305.03301 [pdf, ps, other]

Random Tensor Inequalities and Tail bounds for Bivariate Random Tensor Means, Part I

Authors: Shih-Yu Chang

Abstract: In this work, we apply the concept about operator connection to consider bivariate random tensor means. We first extend classical Markov and Chebyshev inequalities from a random variable to a random tensor by establishing Markov inequality for tensors and Chebyshev inequality for tensors. These inequalities are applied to establish tail bounds for bivariate random tensor means represented by opera… ▽ More In this work, we apply the concept about operator connection to consider bivariate random tensor means. We first extend classical Markov and Chebyshev inequalities from a random variable to a random tensor by establishing Markov inequality for tensors and Chebyshev inequality for tensors. These inequalities are applied to establish tail bounds for bivariate random tensor means represented by operator perspectives based on various types of connection functions: tensor increasing functions, tensor decreasing functions, and tensor concavity functions. We also consider tail bounds relations for the summation and product of eigenvalues based on majorization ordering of eigenvalues of bivariate random tensor means. This is Part I of our work about random tensor inequalities and tail bounds for bivariate random tensor mean. In our Part II, we will consider bivariate random tensor mean with respect to non-invertible random tensors and their applications. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2305.02893 [pdf, other]

APR: Online Distant Point Cloud Registration Through Aggregated Point Cloud Reconstruction

Authors: Quan Liu, Yunsong Zhou, Hongzi Zhu, Shan Chang, Minyi Guo

Abstract: For many driving safety applications, it is of great importance to accurately register LiDAR point clouds generated on distant moving vehicles. However, such point clouds have extremely different point density and sensor perspective on the same object, making registration on such point clouds very hard. In this paper, we propose a novel feature extraction framework, called APR, for online distant… ▽ More For many driving safety applications, it is of great importance to accurately register LiDAR point clouds generated on distant moving vehicles. However, such point clouds have extremely different point density and sensor perspective on the same object, making registration on such point clouds very hard. In this paper, we propose a novel feature extraction framework, called APR, for online distant point cloud registration. Specifically, APR leverages an autoencoder design, where the autoencoder reconstructs a denser aggregated point cloud with several frames instead of the original single input point cloud. Our design forces the encoder to extract features with rich local geometry information based on one single input point cloud. Such features are then used for online distant point cloud registration. We conduct extensive experiments against state-of-the-art (SOTA) feature extractors on KITTI and nuScenes datasets. Results show that APR outperforms all other extractors by a large margin, increasing average registration recall of SOTA extractors by 7.1% on LoKITTI and 4.6% on LoNuScenes. Code is available at https://github.com/liuQuan98/APR. △ Less

Submitted 8 May, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

Comments: in Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2023

arXiv:2304.14897 [pdf]

doi 10.1103/PhysRevApplied.19.054045

First-principles Prediction of Potential Candidate Materials MCu$_3$X$_4$ (M = V, Nb, Ta; X = S, Se, Te) for Neuromorphic Computing

Authors: Baoxing Zhai, Ruiqing Cheng, Tianxing Wang, Li Liu, Lei Yin, Yao Wen, Hao Wang, Sheng Chang, Jun He

Abstract: Inspired by the neuro-synaptic frameworks in the human brain, neuromorphic computing is expected to overcome the bottleneck of traditional von-Neumann architecture and be used in artificial intelligence. Here, we predict a class of potential candidate materials, MCu$_3$X$_4$ (M = V, Nb, Ta; X = S, Se, Te), for neuromorphic computing applications through first-principles calculations based on densi… ▽ More Inspired by the neuro-synaptic frameworks in the human brain, neuromorphic computing is expected to overcome the bottleneck of traditional von-Neumann architecture and be used in artificial intelligence. Here, we predict a class of potential candidate materials, MCu$_3$X$_4$ (M = V, Nb, Ta; X = S, Se, Te), for neuromorphic computing applications through first-principles calculations based on density functional theory. We find that when MCu$_3$X$_4$ are inserted with Li atom, the systems would transform from semiconductors to metals due to the considerable electron filling [~0.8 electrons per formula unit (f.u.)] and still maintain well structural stability. Meanwhile, the inserted Li atom also has a low diffusion barrier (~0.6 eV/f.u.), which ensures the feasibility to control the insertion/extraction of Li by gate voltage. These results establish that the system can achieve the reversible switching between two stable memory states, i.e., high/low resistance state, indicating that it could potentially be used to design synaptic transistor to enable neuromorphic computing. Our work provides inspiration for advancing the search of candidate materials related to neuromorphic computing from the perspective of theoretical calculations. △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: 28+8 pages, 18 figures

Journal ref: Phys. Rev. Applied 19, 054045 (2023)

Showing 101–150 of 1,131 results for author: Chang, S