-
Online Tensor Learning: Computational and Statistical Trade-offs, Adaptivity and Optimal Regret
Authors:
Jian-Feng Cai,
**gyang Li,
Dong Xia
Abstract:
We investigate a generalized framework for estimating latent low-rank tensors in an online setting, encompassing both linear and generalized linear models. This framework offers a flexible approach for handling continuous or categorical variables. Additionally, we investigate two specific applications: online tensor completion and online binary tensor learning. To address these challenges, we prop…
▽ More
We investigate a generalized framework for estimating latent low-rank tensors in an online setting, encompassing both linear and generalized linear models. This framework offers a flexible approach for handling continuous or categorical variables. Additionally, we investigate two specific applications: online tensor completion and online binary tensor learning. To address these challenges, we propose the online Riemannian gradient descent algorithm, which demonstrates linear convergence and the ability to recover the low-rank component under appropriate conditions in all applications. Furthermore, we establish a precise entry-wise error bound for online tensor completion. Notably, our work represents the first attempt to incorporate noise in the online low-rank tensor recovery task. Intriguingly, we observe a surprising trade-off between computational and statistical aspects in the presence of noise. Increasing the step size accelerates convergence but leads to higher statistical error, whereas a smaller step size yields a statistically optimal estimator at the expense of slower convergence. Moreover, we conduct regret analysis for online tensor regression. Under the fixed step size regime, a fascinating trilemma concerning the convergence rate, statistical error rate, and regret is observed. With an optimal choice of step size we achieve an optimal regret of $O(\sqrt{T})$. Furthermore, we extend our analysis to the adaptive setting where the horizon T is unknown. In this case, we demonstrate that by employing different step sizes, we can attain a statistically optimal error rate along with a regret of $O(\log T)$. To validate our theoretical claims, we provide numerical results that corroborate our findings and support our assertions.
△ Less
Submitted 10 July, 2023; v1 submitted 5 June, 2023;
originally announced June 2023.
-
Discovering COVID-19 Coughing and Breathing Patterns from Unlabeled Data Using Contrastive Learning with Varying Pre-Training Domains
Authors:
**** Cai,
Sudip Vhaduri,
Xiao Luo
Abstract:
Rapid discovery of new diseases, such as COVID-19 can enable a timely epidemic response, preventing the large-scale spread and protecting public health. However, limited research efforts have been taken on this problem. In this paper, we propose a contrastive learning-based modeling approach for COVID-19 coughing and breathing pattern discovery from non-COVID coughs. To validate our models, extens…
▽ More
Rapid discovery of new diseases, such as COVID-19 can enable a timely epidemic response, preventing the large-scale spread and protecting public health. However, limited research efforts have been taken on this problem. In this paper, we propose a contrastive learning-based modeling approach for COVID-19 coughing and breathing pattern discovery from non-COVID coughs. To validate our models, extensive experiments have been conducted using four large audio datasets and one image dataset. We further explore the effects of different factors, such as domain relevance and augmentation order on the pre-trained models. Our results show that the proposed model can effectively distinguish COVID-19 coughing and breathing from unlabeled data and labeled non-COVID coughs with an accuracy of up to 0.81 and 0.86, respectively. Findings from this work will guide future research to detect an outbreak of a new disease early.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
The First LHAASO Catalog of Gamma-Ray Sources
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
We present the first catalog of very-high energy and ultra-high energy gamma-ray sources detected by the Large High Altitude Air Shower Observatory (LHAASO). The catalog was compiled using 508 days of data collected by the Water Cherenkov Detector Array (WCDA) from March 2021 to September 2022 and 933 days of data recorded by the Kilometer Squared Array (KM2A) from January 2020 to September 2022.…
▽ More
We present the first catalog of very-high energy and ultra-high energy gamma-ray sources detected by the Large High Altitude Air Shower Observatory (LHAASO). The catalog was compiled using 508 days of data collected by the Water Cherenkov Detector Array (WCDA) from March 2021 to September 2022 and 933 days of data recorded by the Kilometer Squared Array (KM2A) from January 2020 to September 2022. This catalog represents the main result from the most sensitive large coverage gamma-ray survey of the sky above 1 TeV, covering declination from $-$20$^{\circ}$ to 80$^{\circ}$. In total, the catalog contains 90 sources with an extended size smaller than $2^\circ$ and a significance of detection at $> 5σ$. Based on our source association criteria, 32 new TeV sources are proposed in this study. Among the 90 sources, 43 sources are detected with ultra-high energy ($E > 100$ TeV) emission at $> 4σ$ significance level. We provide the position, extension, and spectral characteristics of all the sources in this catalog.
△ Less
Submitted 27 November, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Hate Raids on Twitch: Understanding Real-Time Human-Bot Coordinated Attacks in Live Streaming Communities
Authors:
Jie Cai,
Sagnik Chowdhury,
Hongyang Zhou,
Donghee Yvette Wohn
Abstract:
Online harassment and content moderation have been well-documented in online communities. However, new contexts and systems always bring new ways of harassment and need new moderation mechanisms. This study focuses on hate raids, a form of group attack in real-time in live streaming communities. Through a qualitative analysis of hate raids discussion in the Twitch subreddit (r/Twitch), we found th…
▽ More
Online harassment and content moderation have been well-documented in online communities. However, new contexts and systems always bring new ways of harassment and need new moderation mechanisms. This study focuses on hate raids, a form of group attack in real-time in live streaming communities. Through a qualitative analysis of hate raids discussion in the Twitch subreddit (r/Twitch), we found that (1) hate raids as a human-bot coordinated group attack leverages the live stream system to attack marginalized streamers and other potential groups with(out) breaking the rules, (2) marginalized streamers suffer compound harms with insufficient support from the platform, (3) moderation strategies are overwhelmingly technical, but streamers still struggle to balance moderation and participation considering their marginalization status and needs. We use affordances as a lens to explain how hate raids happens in live streaming systems and propose moderation-by-design as a lens when develo** new features or systems to mitigate the potential abuse of such designs.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Zero-Shot End-to-End Spoken Language Understanding via Cross-Modal Selective Self-Training
Authors:
Jianfeng He,
Julian Salazar,
Kaisheng Yao,
Haoqi Li,
**glun Cai
Abstract:
End-to-end (E2E) spoken language understanding (SLU) is constrained by the cost of collecting speech-semantics pairs, especially when label domains change. Hence, we explore \textit{zero-shot} E2E SLU, which learns E2E SLU without speech-semantics pairs, instead using only speech-text and text-semantics pairs. Previous work achieved zero-shot by pseudolabeling all speech-text transcripts with a na…
▽ More
End-to-end (E2E) spoken language understanding (SLU) is constrained by the cost of collecting speech-semantics pairs, especially when label domains change. Hence, we explore \textit{zero-shot} E2E SLU, which learns E2E SLU without speech-semantics pairs, instead using only speech-text and text-semantics pairs. Previous work achieved zero-shot by pseudolabeling all speech-text transcripts with a natural language understanding (NLU) model learned on text-semantics corpora. However, this method requires the domains of speech-text and text-semantics to match, which often mismatch due to separate collections. Furthermore, using the entire collected speech-text corpus from any domains leads to \textit{imbalance} and \textit{noise} issues. To address these, we propose \textit{cross-modal selective self-training} (CMSST). CMSST tackles imbalance by clustering in a joint space of the three modalities (speech, text, and semantics) and handles label noise with a selection network. We also introduce two benchmarks for zero-shot E2E SLU, covering matched and found speech (mismatched) settings. Experiments show that CMSST improves performance in both two settings, with significantly reduced sample sizes and training time. Our code and data are released in https://github.com/amazon-science/zero-shot-E2E-slu.
△ Less
Submitted 2 February, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Authors:
**glun Cai,
Monica Sunkara,
Xilai Li,
Anshu Bhatia,
Xiao Pan,
Sravan Bodapati
Abstract:
Masked Language Models (MLMs) have proven to be effective for second-pass rescoring in Automatic Speech Recognition (ASR) systems. In this work, we propose Masked Audio Text Encoder (MATE), a multi-modal masked language model rescorer which incorporates acoustic representations into the input space of MLM. We adopt contrastive learning for effectively aligning the modalities by learning shared rep…
▽ More
Masked Language Models (MLMs) have proven to be effective for second-pass rescoring in Automatic Speech Recognition (ASR) systems. In this work, we propose Masked Audio Text Encoder (MATE), a multi-modal masked language model rescorer which incorporates acoustic representations into the input space of MLM. We adopt contrastive learning for effectively aligning the modalities by learning shared representations. We show that using a multi-modal rescorer is beneficial for domain generalization of the ASR system when target domain data is unavailable. MATE reduces word error rate (WER) by 4%-16% on in-domain, and 3%-7% on out-of-domain datasets, over the text-only baseline. Additionally, with very limited amount of training data (0.8 hours), MATE achieves a WER reduction of 8%-23% over the first-pass baseline.
△ Less
Submitted 24 May, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Value Iteration Networks with Gated Summarization Module
Authors:
**yu Cai,
Jialong Li,
Mingyue Zhang,
Kenji Tei
Abstract:
In this paper, we address the challenges faced by Value Iteration Networks (VIN) in handling larger input maps and mitigating the impact of accumulated errors caused by increased iterations. We propose a novel approach, Value Iteration Networks with Gated Summarization Module (GS-VIN), which incorporates two main improvements: (1) employing an Adaptive Iteration Strategy in the Value Iteration mod…
▽ More
In this paper, we address the challenges faced by Value Iteration Networks (VIN) in handling larger input maps and mitigating the impact of accumulated errors caused by increased iterations. We propose a novel approach, Value Iteration Networks with Gated Summarization Module (GS-VIN), which incorporates two main improvements: (1) employing an Adaptive Iteration Strategy in the Value Iteration module to reduce the number of iterations, and (2) introducing a Gated Summarization module to summarize the iterative process. The adaptive iteration strategy uses larger convolution kernels with fewer iteration times, reducing network depth and increasing training stability while maintaining the accuracy of the planning process. The gated summarization module enables the network to emphasize the entire planning process, rather than solely relying on the final global planning outcome, by temporally and spatially resampling the entire planning process within the VI module. We conduct experiments on 2D grid world path-finding problems and the Atari Mr. Pac-man environment, demonstrating that GS-VIN outperforms the baseline in terms of single-step accuracy, planning success rate, and overall performance across different map sizes. Additionally, we provide an analysis of the relationship between input size, kernel size, and the number of iterations in VI-based models, which is applicable to a majority of VI-based models and offers valuable insights for researchers and industrial deployment.
△ Less
Submitted 16 May, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Computationally Efficient and Statistically Optimal Robust High-Dimensional Linear Regression
Authors:
Yinan Shen,
**gyang Li,
Jian-Feng Cai,
Dong Xia
Abstract:
High-dimensional linear regression under heavy-tailed noise or outlier corruption is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since the robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed,…
▽ More
High-dimensional linear regression under heavy-tailed noise or outlier corruption is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since the robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a projected sub-gradient descent algorithm for both the sparse linear regression and low-rank linear regression problems. The algorithm is not only computationally efficient with linear convergence but also statistically optimal, be the noise Gaussian or heavy-tailed with a finite 1 + epsilon moment. The convergence theory is established for a general framework and its specific applications to absolute loss, Huber loss and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of two-phase convergence. In phase one, the algorithm behaves as in typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator, which is already observed in the existing literature. Interestingly, during phase two, the algorithm converges linearly as if minimizing a smooth and strongly convex objective function, and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Numerical simulations confirm our theoretical discovery and showcase the superiority of our algorithm over prior methods.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Measurement of ultra-high-energy diffuse gamma-ray emission of the Galactic plane from 10 TeV to 1 PeV with LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The diffuse Galactic $γ$-ray emission, mainly produced via interactions between cosmic rays and the interstellar medium and/or radiation field, is a very important probe of the distribution, propagation, and interaction of cosmic rays in the Milky Way. In this work we report the measurements of diffuse $γ$-rays from the Galactic plane between 10 TeV and 1 PeV energies, with the square kilometer ar…
▽ More
The diffuse Galactic $γ$-ray emission, mainly produced via interactions between cosmic rays and the interstellar medium and/or radiation field, is a very important probe of the distribution, propagation, and interaction of cosmic rays in the Milky Way. In this work we report the measurements of diffuse $γ$-rays from the Galactic plane between 10 TeV and 1 PeV energies, with the square kilometer array of the Large High Altitude Air Shower Observatory (LHAASO). Diffuse emissions from the inner ($15^{\circ}<l<125^{\circ}$, $|b|<5^{\circ}$) and outer ($125^{\circ}<l<235^{\circ}$, $|b|<5^{\circ}$) Galactic plane are detected with $29.1σ$ and $12.7σ$ significance, respectively. The outer Galactic plane diffuse emission is detected for the first time in the very- to ultra-high-energy domain ($E>10$~TeV). The energy spectrum in the inner Galaxy regions can be described by a power-law function with an index of $-2.99\pm0.04$, which is different from the curved spectrum as expected from hadronic interactions between locally measured cosmic rays and the line-of-sight integrated gas content. Furthermore, the measured flux is higher by a factor of $\sim3$ than the prediction. A similar spectrum with an index of $-2.99\pm0.07$ is found in the outer Galaxy region, and the absolute flux for $10\lesssim E\lesssim60$ TeV is again higher than the prediction for hadronic cosmic ray interactions. The latitude distributions of the diffuse emission are consistent with the gas distribution, while the longitude distributions show clear deviation from the gas distribution. The LHAASO measurements imply that either additional emission sources exist or cosmic ray intensities have spatial variations.
△ Less
Submitted 19 August, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation
Authors:
Nilaksh Das,
Monica Sunkara,
Sravan Bodapati,
**glun Cai,
Devang Kulshreshtha,
Jeff Farris,
Katrin Kirchhoff
Abstract:
End-to-end ASR models trained on large amount of data tend to be implicitly biased towards language semantics of the training data. Internal language model estimation (ILME) has been proposed to mitigate this bias for autoregressive models such as attention-based encoder-decoder and RNN-T. Typically, ILME is performed by modularizing the acoustic and language components of the model architecture,…
▽ More
End-to-end ASR models trained on large amount of data tend to be implicitly biased towards language semantics of the training data. Internal language model estimation (ILME) has been proposed to mitigate this bias for autoregressive models such as attention-based encoder-decoder and RNN-T. Typically, ILME is performed by modularizing the acoustic and language components of the model architecture, and eliminating the acoustic input to perform log-linear interpolation with the text-only posterior. However, for CTC-based ASR, it is not as straightforward to decouple the model into such acoustic and language components, as CTC log-posteriors are computed in a non-autoregressive manner. In this work, we propose a novel ILME technique for CTC-based ASR models. Our method iteratively masks the audio timesteps to estimate a pseudo log-likelihood of the internal LM by accumulating log-posteriors for only the masked timesteps. Extensive evaluation across multiple out-of-domain datasets reveals that the proposed approach improves WER by up to 9.8% and OOV F1-score by up to 24.6% relative to Shallow Fusion, when only text data from target domain is available. In the case of zero-shot domain adaptation, with no access to any target domain data, we demonstrate that removing the source domain bias with ILME can still outperform Shallow Fusion to improve WER by up to 9.3% relative.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System for Multilingual Named Entity Recognition
Authors:
Zeqi Tan,
Shen Huang,
Zixia Jia,
Jiong Cai,
Yinghui Li,
Weiming Lu,
Yueting Zhuang,
Kewei Tu,
Pengjun Xie,
Fei Huang,
Yong Jiang
Abstract:
The MultiCoNER \RNum{2} shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios, and it inherits the semantic ambiguity and low-context setting of the MultiCoNER \RNum{1} task. To cope with these problems, the previous top systems in the MultiCoNER \RNum{1} either incorporate the knowledge bases or gazetteers. However, they still suffer from insuf…
▽ More
The MultiCoNER \RNum{2} shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios, and it inherits the semantic ambiguity and low-context setting of the MultiCoNER \RNum{1} task. To cope with these problems, the previous top systems in the MultiCoNER \RNum{1} either incorporate the knowledge bases or gazetteers. However, they still suffer from insufficient knowledge, limited context length, single retrieval strategy. In this paper, our team \textbf{DAMO-NLP} proposes a unified retrieval-augmented system (U-RaNER) for fine-grained multilingual NER. We perform error analysis on the previous top systems and reveal that their performance bottleneck lies in insufficient knowledge. Also, we discover that the limited context length causes the retrieval knowledge to be invisible to the model. To enhance the retrieval context, we incorporate the entity-centric Wikidata knowledge base, while utilizing the infusion approach to broaden the contextual scope of the model. Also, we explore various search strategies and refine the quality of retrieval knowledge. Our system\footnote{We will release the dataset, code, and scripts of our system at {\small \url{https://github.com/modelscope/AdaSeq/tree/master/examples/U-RaNER}}.} wins 9 out of 13 tracks in the MultiCoNER \RNum{2} shared task. Additionally, we compared our system with ChatGPT, one of the large language models which have unlocked strong capabilities on many tasks. The results show that there is still much room for improvement for ChatGPT on the extraction task.
△ Less
Submitted 16 May, 2023; v1 submitted 5 May, 2023;
originally announced May 2023.
-
A Preconditioned Riemannian Gradient Descent Algorithm for Low-Rank Matrix Recovery
Authors:
Fengmiao Bian,
Jian-Feng Cai,
Rui Zhang
Abstract:
The low-rank matrix recovery problem often arises in various fields, including signal processing, machine learning, and imaging science. The Riemannian gradient descent (RGD) algorithm has proven to be an efficient algorithm for solving this problem. In this paper, we present a preconditioned Riemannian gradient descent (PRGD) for low-rank matrix recovery. The preconditioner, noted for its simplic…
▽ More
The low-rank matrix recovery problem often arises in various fields, including signal processing, machine learning, and imaging science. The Riemannian gradient descent (RGD) algorithm has proven to be an efficient algorithm for solving this problem. In this paper, we present a preconditioned Riemannian gradient descent (PRGD) for low-rank matrix recovery. The preconditioner, noted for its simplicity and computational efficiency, is constructed by weighting the (i,j)-th entry of the gradient matrix according to the norms of the i-th row and the j-th column. We establish the theoretical recovery guarantee for PRGD under the restricted isometry property assumption. Experimental results indicate that PRGD can accelerate RGD by up to tenfold in solving low-rank matrix recovery problems such as matrix completion.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Explicit Correspondence Matching for Generalizable Neural Radiance Fields
Authors:
Yuedong Chen,
Haofei Xu,
Qianyi Wu,
Chuanxia Zheng,
Tat-Jen Cham,
Jianfei Cai
Abstract:
We present a new generalizable NeRF method that is able to directly generalize to new unseen scenarios and perform novel view synthesis with as few as two source views. The key to our approach lies in the explicitly modeled correspondence matching information, so as to provide the geometry prior to the prediction of NeRF color and density for volume rendering. The explicit correspondence matching…
▽ More
We present a new generalizable NeRF method that is able to directly generalize to new unseen scenarios and perform novel view synthesis with as few as two source views. The key to our approach lies in the explicitly modeled correspondence matching information, so as to provide the geometry prior to the prediction of NeRF color and density for volume rendering. The explicit correspondence matching is quantified with the cosine similarity between image features sampled at the 2D projections of a 3D point on different views, which is able to provide reliable cues about the surface geometry. Unlike previous methods where image features are extracted independently for each view, we consider modeling the cross-view interactions via Transformer cross-attention, which greatly improves the feature matching quality. Our method achieves state-of-the-art results on different evaluation settings, with the experiments showing a strong correlation between our learned cosine feature similarity and volume density, demonstrating the effectiveness and superiority of our proposed method. Code is at https://github.com/donydchen/matchnerf
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Active RIS-aided EH-NOMA Networks: A Deep Reinforcement Learning Approach
Authors:
Zhaoyuan Shi,
Huabing Lu,
Xianzhong Xie,
Helin Yang,
Chongwen Huang,
Jun Cai,
Zhiguo Ding
Abstract:
An active reconfigurable intelligent surface (RIS)-aided multi-user downlink communication system is investigated, where non-orthogonal multiple access (NOMA) is employed to improve spectral efficiency, and the active RIS is powered by energy harvesting (EH). The problem of joint control of the RIS's amplification matrix and phase shift matrix is formulated to maximize the communication success ra…
▽ More
An active reconfigurable intelligent surface (RIS)-aided multi-user downlink communication system is investigated, where non-orthogonal multiple access (NOMA) is employed to improve spectral efficiency, and the active RIS is powered by energy harvesting (EH). The problem of joint control of the RIS's amplification matrix and phase shift matrix is formulated to maximize the communication success ratio with considering the quality of service (QoS) requirements of users, dynamic communication state, and dynamic available energy of RIS. To tackle this non-convex problem, a cascaded deep learning algorithm namely long short-term memory-deep deterministic policy gradient (LSTM-DDPG) is designed. First, an advanced LSTM based algorithm is developed to predict users' dynamic communication state. Then, based on the prediction results, a DDPG based algorithm is proposed to joint control the amplification matrix and phase shift matrix of the RIS. Finally, simulation results verify the accuracy of the prediction of the proposed LSTM algorithm, and demonstrate that the LSTM-DDPG algorithm has a significant advantage over other benchmark algorithms in terms of communication success ratio performance.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Engineering artificial atomic systems of giant electric dipole moment
Authors:
Baiyi Yu,
Yaoming Chu,
Ralf Betzholz,
Shaoliang Zhang,
Jianming Cai
Abstract:
The electric dipole moment (EDM) plays a crucial role in determining the interaction strength of an atom with electric fields, making it paramount to quantum technologies based on coherent atomic control. We propose a scheme for engineering the potential in a Paul trap to realize a two-level quantum system with a giant EDM formed by the motional states of a trapped electron. We show that, under re…
▽ More
The electric dipole moment (EDM) plays a crucial role in determining the interaction strength of an atom with electric fields, making it paramount to quantum technologies based on coherent atomic control. We propose a scheme for engineering the potential in a Paul trap to realize a two-level quantum system with a giant EDM formed by the motional states of a trapped electron. We show that, under realistic experimental conditions, the EDM can significantly exceed the ones attainable with Rydberg atoms. Furthermore, we show that such artificial atomic dipoles can be efficiently initialized, readout, and coherently controlled, thereby providing a potential platform for quantum technologies such as ultrahigh-sensitivity electric-field sensing.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Accelerated quantum control in a three-level system by jum** along the geodesics
Authors:
Musang Gong,
Min Yu,
Ralf Betzholz,
Yaoming Chu,
Pengcheng Yang,
Zhenyu Wang,
Jianming Cai
Abstract:
In a solid-state spin system, we experimentally demonstrate a protocol for quantum-state population transfer with an improved efficiency compared to traditional stimulated Raman adiabatic passage (STIRAP). Using the ground-state triplet of the nitrogen-vacancy center in diamond, we show that the required evolution time for high-fidelity state transfer can be reduced by almost one order of magnitud…
▽ More
In a solid-state spin system, we experimentally demonstrate a protocol for quantum-state population transfer with an improved efficiency compared to traditional stimulated Raman adiabatic passage (STIRAP). Using the ground-state triplet of the nitrogen-vacancy center in diamond, we show that the required evolution time for high-fidelity state transfer can be reduced by almost one order of magnitude. Furthermore, we establish an improved robustness against frequency detuning caused by magnetic noise as compared to STIRAP. These results provide a powerful tool for coherent spin manipulation in the context of quantum sensing and quantum computation.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
The Design Space of Generative Models
Authors:
Meredith Ringel Morris,
Carrie J. Cai,
Jess Holbrook,
Chinmay Kulkarni,
Michael Terry
Abstract:
Card et al.'s classic paper "The Design Space of Input Devices" established the value of design spaces as a tool for HCI analysis and invention. We posit that develo** design spaces for emerging pre-trained, generative AI models is necessary for supporting their integration into human-centered systems and practices. We explore what it means to develop an AI model design space by proposing two de…
▽ More
Card et al.'s classic paper "The Design Space of Input Devices" established the value of design spaces as a tool for HCI analysis and invention. We posit that develo** design spaces for emerging pre-trained, generative AI models is necessary for supporting their integration into human-centered systems and practices. We explore what it means to develop an AI model design space by proposing two design spaces relating to generative AI models: the first considers how HCI can impact generative models (i.e., interfaces for models) and the second considers how generative models can impact HCI (i.e., models as an HCI prototy** material).
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
Signatures of Fractional Quantum Anomalous Hall States in Twisted MoTe2 Bilayer
Authors:
Jiaqi Cai,
Eric Anderson,
Chong Wang,
Xiaowei Zhang,
Xiaoyu Liu,
William Holtzmann,
Yinong Zhang,
Fengren Fan,
Takashi Taniguchi,
Kenji Watanabe,
Ying Ran,
Ting Cao,
Liang Fu,
Di Xiao,
Wang Yao,
Xiaodong Xu
Abstract:
The interplay between spontaneous symmetry breaking and topology can result in exotic quantum states of matter. A celebrated example is the quantum anomalous Hall (QAH) state, which exhibits an integer quantum Hall effect at zero magnetic field thanks to its intrinsic ferromagnetism. In the presence of strong electron-electron interactions, exotic fractional-QAH (FQAH) states at zero magnetic fiel…
▽ More
The interplay between spontaneous symmetry breaking and topology can result in exotic quantum states of matter. A celebrated example is the quantum anomalous Hall (QAH) state, which exhibits an integer quantum Hall effect at zero magnetic field thanks to its intrinsic ferromagnetism. In the presence of strong electron-electron interactions, exotic fractional-QAH (FQAH) states at zero magnetic field can emerge. These states could host fractional excitations, including non-Abelian anyons - crucial building blocks for topological quantum computation. Flat Chern bands are widely considered as a desirable venue to realize the FQAH state. For this purpose, twisted transition metal dichalcogenide homobilayers in rhombohedral stacking have recently been predicted to be a promising material platform. Here, we report experimental signatures of FQAH states in 3.7-degree twisted MoTe2 bilayer. Magnetic circular dichroism measurements reveal robust ferromagnetic states at fractionally hole filled moiré minibands. Using trion photoluminescence as a sensor, we obtain a Landau fan diagram which shows linear shifts in carrier densities corresponding to the v=-2/3 and -3/5 ferromagnetic states with applied magnetic field. These shifts match the Streda formula dispersion of FQAH states with fractionally quantized Hall conductance of -2/3$e^2/h$ and -3/5$e^2/h$, respectively. Moreover, the v=-1 state exhibits a dispersion corresponding to Chern number -1, consistent with the predicted QAH state. In comparison, several non-ferromagnetic states on the electron do** side do not disperse, i.e., are trivial correlated insulators. The observed topological states can be further electrically driven into topologically trivial states. Our findings provide clear evidence of the long-sought FQAH states, putting forward MoTe2 moiré superlattices as a fascinating platform for exploring fractional excitations.
△ Less
Submitted 18 April, 2023; v1 submitted 17 April, 2023;
originally announced April 2023.
-
Realizing Immersive Communications in Human Digital Twin by Edge Computing Empowered Tactile Internet: Visions and Case Study
Authors:
Hao Xiang,
Changyan Yi,
Kun Wu,
Jiayuan Chen,
Jun Cai,
Dusit Niyato,
Xuemin,
Shen
Abstract:
Human digital twin (HDT) is expected to revolutionize the future human lifestyle and prompts the development of advanced human-centric applications (e.g., Metaverse) by bridging physical and virtual spaces. However, the fulfillment of HDT poses stringent demands on the pervasive connectivity, real-time feedback, multi-modal data transmission and ultra-high reliability, which urge the need of enabl…
▽ More
Human digital twin (HDT) is expected to revolutionize the future human lifestyle and prompts the development of advanced human-centric applications (e.g., Metaverse) by bridging physical and virtual spaces. However, the fulfillment of HDT poses stringent demands on the pervasive connectivity, real-time feedback, multi-modal data transmission and ultra-high reliability, which urge the need of enabling immersive communications. In this article, we shed light on the design of an immersive communication framework for HDT by edge computing empowered tactile Internet (namely IC-HDT-ECoTI). Aiming at offering strong interactions and extremely immersive quality of experience, we introduce the system architecture of IC-HDT-ECoTI, and analyze its major design requirements and challenges. Moreover, we present core guidelines and detailed steps for system implementations. In addition, we conduct an experimental study based on our recently built testbed, which shows a particular use case of IC-HDT-ECoTI in physical therapy, and the obtained results indicate that the proposed framework can significantly improve the effectiveness of the system. Finally, we conclude this article with a brief discussion of open issues and future directions.
△ Less
Submitted 17 June, 2024; v1 submitted 14 April, 2023;
originally announced April 2023.
-
AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics
Authors:
Shan Jia,
Mingzhen Huang,
Zhou Zhou,
Yan Ju,
Jialing Cai,
Siwei Lyu
Abstract:
Recent advancements in language-image models have led to the development of highly realistic images that can be generated from textual descriptions. However, the increased visual quality of these generated images poses a potential threat to the field of media forensics. This paper aims to investigate the level of challenge that language-image generation models pose to media forensics. To achieve t…
▽ More
Recent advancements in language-image models have led to the development of highly realistic images that can be generated from textual descriptions. However, the increased visual quality of these generated images poses a potential threat to the field of media forensics. This paper aims to investigate the level of challenge that language-image generation models pose to media forensics. To achieve this, we propose a new approach that leverages the DALL-E2 language-image model to automatically generate and splice masked regions guided by a text prompt. To ensure the creation of realistic manipulations, we have designed an annotation platform with human checking to verify reasonable text prompts. This approach has resulted in the creation of a new image dataset called AutoSplice, containing 5,894 manipulated and authentic images. Specifically, we have generated a total of 3,621 images by locally or globally manipulating real-world image-caption pairs, which we believe will provide a valuable resource for develo** generalized detection methods in this area. The dataset is evaluated under two media forensic tasks: forgery detection and localization. Our extensive experiments show that most media forensic models struggle to detect the AutoSplice dataset as an unseen manipulation. However, when fine-tuned models are used, they exhibit improved performance in both tasks.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Generative Agents: Interactive Simulacra of Human Behavior
Authors:
Joon Sung Park,
Joseph C. O'Brien,
Carrie J. Cai,
Meredith Ringel Morris,
Percy Liang,
Michael S. Bernstein
Abstract:
Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototy** tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; t…
▽ More
Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototy** tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.
△ Less
Submitted 5 August, 2023; v1 submitted 6 April, 2023;
originally announced April 2023.
-
Flipbot: Learning Continuous Paper Flip** via Coarse-to-Fine Exteroceptive-Proprioceptive Exploration
Authors:
Chao Zhao,
Chunli Jiang,
Junhao Cai,
Michael Yu Wang,
Hongyu Yu,
Qifeng Chen
Abstract:
This paper tackles the task of singulating and gras** paper-like deformable objects. We refer to such tasks as paper-flip**. In contrast to manipulating deformable objects that lack compression strength (such as shirts and ropes), minor variations in the physical properties of the paper-like deformable objects significantly impact the results, making manipulation highly challenging. Here, we p…
▽ More
This paper tackles the task of singulating and gras** paper-like deformable objects. We refer to such tasks as paper-flip**. In contrast to manipulating deformable objects that lack compression strength (such as shirts and ropes), minor variations in the physical properties of the paper-like deformable objects significantly impact the results, making manipulation highly challenging. Here, we present Flipbot, a novel solution for flip** paper-like deformable objects. Flipbot allows the robot to capture object physical properties by integrating exteroceptive and proprioceptive perceptions that are indispensable for manipulating deformable objects. Furthermore, by incorporating a proposed coarse-to-fine exploration process, the system is capable of learning the optimal control parameters for effective paper-flip** through proprioceptive and exteroceptive inputs. We deploy our method on a real-world robot with a soft gripper and learn in a self-supervised manner. The resulting policy demonstrates the effectiveness of Flipbot on paper-flip** tasks with various settings beyond the reach of prior studies, including but not limited to flip** pages throughout a book and emptying paper sheets in a box.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Learn to Grasp via Intention Discovery and its Application to Challenging Clutter
Authors:
Chao Zhao,
Chunli Jiang,
Junhao Cai,
Hongyu Yu,
Michael Yu Wang,
Qifeng Chen
Abstract:
Humans excel in gras** objects through diverse and robust policies, many of which are so probabilistically rare that exploration-based learning methods hardly observe and learn. Inspired by the human learning process, we propose a method to extract and exploit latent intents from demonstrations, and then learn diverse and robust gras** policies through self-exploration. The resulting policy ca…
▽ More
Humans excel in gras** objects through diverse and robust policies, many of which are so probabilistically rare that exploration-based learning methods hardly observe and learn. Inspired by the human learning process, we propose a method to extract and exploit latent intents from demonstrations, and then learn diverse and robust gras** policies through self-exploration. The resulting policy can grasp challenging objects in various environments with an off-the-shelf parallel gripper. The key component is a learned intention estimator, which maps gripper pose and visual sensory to a set of sub-intents covering important phases of the gras** movement. Sub-intents can be used to build an intrinsic reward to guide policy learning. The learned policy demonstrates remarkable zero-shot generalization from simulation to the real world while retaining its robustness against states that have never been encountered during training, novel objects such as protractors and user manuals, and environments such as the cluttered conveyor.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
ERRA: An Embodied Representation and Reasoning Architecture for Long-horizon Language-conditioned Manipulation Tasks
Authors:
Chao Zhao,
Shuai Yuan,
Chunli Jiang,
Junhao Cai,
Hongyu Yu,
Michael Yu Wang,
Qifeng Chen
Abstract:
This letter introduces ERRA, an embodied learning architecture that enables robots to jointly obtain three fundamental capabilities (reasoning, planning, and interaction) for solving long-horizon language-conditioned manipulation tasks. ERRA is based on tightly-coupled probabilistic inferences at two granularity levels. Coarse-resolution inference is formulated as sequence generation through a lar…
▽ More
This letter introduces ERRA, an embodied learning architecture that enables robots to jointly obtain three fundamental capabilities (reasoning, planning, and interaction) for solving long-horizon language-conditioned manipulation tasks. ERRA is based on tightly-coupled probabilistic inferences at two granularity levels. Coarse-resolution inference is formulated as sequence generation through a large language model, which infers action language from natural language instruction and environment state. The robot then zooms to the fine-resolution inference part to perform the concrete action corresponding to the action language. Fine-resolution inference is constructed as a Markov decision process, which takes action language and environmental sensing as observations and outputs the action. The results of action execution in environments provide feedback for subsequent coarse-resolution reasoning. Such coarse-to-fine inference allows the robot to decompose and achieve long-horizon tasks interactively. In extensive experiments, we show that ERRA can complete various long-horizon manipulation tasks specified by abstract language instructions. We also demonstrate successful generalization to the novel but similar natural language instructions.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Programming Correlated Magnetic States via Gate Controlled Moiré Geometry
Authors:
Eric Anderson,
Feng-Ren Fan,
Jiaqi Cai,
William Holtzmann,
Takashi Taniguchi,
Kenji Watanabe,
Di Xiao,
Wang Yao,
Xiaodong Xu
Abstract:
Understanding quantum many-body systems is at the heart of condensed matter physics. The ability to control the underlying lattice geometry of a system, and thus its many-body interactions, would enable the realization of and transition between emergent quantum ground states. Here, we report in-situ gate switching between honeycomb and triangular lattice geometries of an electron many-body Hamilto…
▽ More
Understanding quantum many-body systems is at the heart of condensed matter physics. The ability to control the underlying lattice geometry of a system, and thus its many-body interactions, would enable the realization of and transition between emergent quantum ground states. Here, we report in-situ gate switching between honeycomb and triangular lattice geometries of an electron many-body Hamiltonian in R-stacked MoTe2 moiré bilayers, resulting in switchable magnetic exchange interactions. At zero electric field, we observe a correlated ferromagnetic insulator near one hole per moiré unit cell (ν=-1), i.e., a quarter-filled honeycomb lattice, with a widely tunable Curie temperature up to 14K. Fully polarizing layer pseudospin via electric field switches the system into a half-filled triangular lattice with antiferromagnetic interactions. Further do** this layer-polarized superlattice introduces carriers into the empty layer, tuning the antiferromagnetic exchange interaction back to ferromagnetic. Our work demonstrates R-stacked MoTe2 moirés to be a new laboratory for engineering correlated states with nontrivial topology.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Planar 3-way Edge Perfect Matching Leads to A Holant Dichotomy
Authors:
**-Yi Cai,
Austen Z. Fan
Abstract:
We prove a complexity dichotomy theorem for a class of Holant problems on planar 3-regular bipartite graphs. The complexity dichotomy states that for every weighted constraint function $f$ defining the problem (the weights can even be negative), the problem is either computable in polynomial time if $f$ satisfies a tractability criterion, or \#P-hard otherwise. One particular problem in this probl…
▽ More
We prove a complexity dichotomy theorem for a class of Holant problems on planar 3-regular bipartite graphs. The complexity dichotomy states that for every weighted constraint function $f$ defining the problem (the weights can even be negative), the problem is either computable in polynomial time if $f$ satisfies a tractability criterion, or \#P-hard otherwise. One particular problem in this problem space is a long-standing open problem of Moore and Robson on counting Cubic Planar X3C. The dichotomy resolves this problem by showing that it is \numP-hard. Our proof relies on the machinery of signature theory developed in the study of Holant problems. An essential ingredient in our proof of the main dichotomy theorem is a pure graph-theoretic result: Excepting some trivial cases, every 3-regular plane graph has a planar 3-way edge perfect matching. The proof technique of this graph-theoretic result is a combination of algebraic and combinatorial methods.
The P-time tractability criterion of the dichotomy is explicit. Other than the known classes of tractable constraint functions (degenerate, affine, product type, matchgates-transformable) we also identify a new infinite set of P-time computable planar Holant problems; however, its tractability is not by a direct holographic transformation to matchgates, but by a combination of this method and a global argument. The complexity dichotomy states that everything else in this Holant class is \#P-hard.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
AlphaMat: A Material Informatics Hub Connecting Data, Features, Models and Applications
Authors:
Zhilong Wang,
Junfei Cai,
An Chen,
Yanqiang Han,
Kehao Tao,
Simin Ye,
Shiwei Wang,
Imran Ali,
**** Li
Abstract:
The development of modern civil industry, energy and information technology is inseparable from the rapid explorations of new materials, which are hampered by months to years of painstaking attempts, resulting in only a small fraction of materials being determined in a vast chemical space. Artificial intelligence (AI)-based methods are promising to address this gap, but face many challenges such a…
▽ More
The development of modern civil industry, energy and information technology is inseparable from the rapid explorations of new materials, which are hampered by months to years of painstaking attempts, resulting in only a small fraction of materials being determined in a vast chemical space. Artificial intelligence (AI)-based methods are promising to address this gap, but face many challenges such as data scarcity and inaccurate material descriptor coding. Here, we develop an AI platform, AlphaMat, that connects materials and applications. AlphaMat is not limited by the data scale (from 101 to 106) and can design structural and component descriptors that are effective for docking with various AI models. With prediction time of milliseconds and high accuracy, AlphaMat exhibits strong powers to model at least 12 common attributes (formation energy, band gap, ionic conductivity, magnetism, phonon property, bulk modulus, dielectric constant, adsorption energy, etc.), resulting in an unexplored material database with over 117,000 entries. We further demonstrate the ability of AlphaMat to mine and design materials, which successfully discover thousands of new materials in photonics, batteries, catalysts, and capacitors from the largest inorganic compound databases that cover all elements in periodic table. This work proposes the first material informatics hub that does not require users to have strong programming knowledge to build AI models to design materials. Users can either directly retrieve our database or easily build AI models through AlphaMat to discover and design the required materials. AlphaMat can shorten the cycle of database construction and material discovery by at least decades, and its effective use will facilitate the applications of AI technology in material science and lead scientific and technological progress to a new height.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
An in-depth exploration of LAMOST Unknown spectra based on density clustering
Authors:
Haifeng Yang,
Xiaona Yin,
Jianghui Cai,
Yuqing Yang,
Ali Luo,
Zhongrui Bai,
Lichan Zhou,
Xujun Zhao,
Yaling Xun
Abstract:
LAMOST (Large Sky Area Multi-Object Fiber Spectroscopic Telescope) has completed the observation of nearly 20 million celestial objects, including a class of spectra labeled `Unknown'. Besides low signal-to-noise ratio, these spectra often show some anomalous features that do not work well with current templates. In this paper, a total of 638,000 `Unknown' spectra from LAMOST DR5 are selected, and…
▽ More
LAMOST (Large Sky Area Multi-Object Fiber Spectroscopic Telescope) has completed the observation of nearly 20 million celestial objects, including a class of spectra labeled `Unknown'. Besides low signal-to-noise ratio, these spectra often show some anomalous features that do not work well with current templates. In this paper, a total of 638,000 `Unknown' spectra from LAMOST DR5 are selected, and an unsupervised-based analytical framework of `Unknown' spectra named SA-Frame (Spectra Analysis-Frame) is provided to explore their origins from different perspectives. The SA-Frame is composed of three parts: NAPC-Spec clustering, characterization and origin analysis. First, NAPC-Spec(Nonparametric density clustering algorithm for spectra) characterizes different features in the "unknown" spectrum by adjusting the influence space and divergence distance to minimize the effects of noise and high dimensionality, resulting in 13 types. Second, characteristic extraction and representation of clustering results are carried out based on spectral lines and continuum, where these 13 types are characterized as regular spectra with low S/Ns, splicing problems, suspected galactic emission signals, contamination from city light and un-gregarious type respectively. Third, a preliminary analysis of their origins is made from the characteristics of the observational targets, contamination from the sky, and the working status of the instruments. These results would be valuable for improving the overall data quality of large-scale spectral surveys.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Dynamically tunable moiré Rydberg excitons in a monolayer semiconductor on twisted bilayer graphene
Authors:
Minhao He,
Jiaqi Cai,
Huiyuan Zheng,
Eric Seewald,
Takashi Taniguchi,
Kenji Watanabe,
Jiaqiang Yan,
Matthew Yankowitz,
Abhay Pasupathy,
Wang Yao,
Xiaodong Xu
Abstract:
Moiré excitons are emergent optical excitations in 2D semiconductors with deep moiré superlattice potentials. While these excitations have been realized in several platforms, a system with dynamically tunable moiré potential to tailor the moiré exciton properties is yet to be realized. Here, we present a continuously tunable moiré potential in a monolayer WSe2 that is enabled by its proximity to t…
▽ More
Moiré excitons are emergent optical excitations in 2D semiconductors with deep moiré superlattice potentials. While these excitations have been realized in several platforms, a system with dynamically tunable moiré potential to tailor the moiré exciton properties is yet to be realized. Here, we present a continuously tunable moiré potential in a monolayer WSe2 that is enabled by its proximity to twisted bilayer graphene (TBG) near the magic-angle. Due to its flat electronic bands, charge distribution is highly localized and forms a triangular lattice in TBG. Tuning the local charge density via electrostatic gating, TBG thus provides a spatially varying and dynamically tunable dielectric superlattice for modulating monolayer exciton wavefunctions. By performing optical reflection spectroscopy, we observe emergent moiré exciton Rydberg branches in monolayer WSe2 with increased energy splitting upon do** TBG. The twist-angle dependence reveals that the observation is due to a hybridization between bright and dark Rydberg states enabled by the moiré potential. Further, at the magic-angle near 1.1°, the moiré Rydberg excitons form a sawtooth pattern with do** owing to the formation of strongly correlated states in the TBG. Our study provides a new platform for engineering moiré excitons as well as optical accessibility to the electronic states with small correlation gaps in TBG.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning
Authors:
Haoyu He,
Jianfei Cai,
**g Zhang,
Dacheng Tao,
Bohan Zhuang
Abstract:
Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty. However, existing PEFT methods introduce trainable parameters to the same positions across different…
▽ More
Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty. However, existing PEFT methods introduce trainable parameters to the same positions across different tasks depending solely on human heuristics and neglect the domain gaps. To this end, we study where to introduce and how to allocate trainable parameters by proposing a novel Sensitivity-aware visual Parameter-efficient fine-Tuning (SPT) scheme, which adaptively allocates trainable parameters to task-specific important positions given a desired tunable parameter budget. Specifically, our SPT first quickly identifies the sensitive parameters that require tuning for a given task in a data-dependent way. Next, our SPT further boosts the representational capability for the weight matrices whose number of sensitive parameters exceeds a pre-defined threshold by utilizing existing structured tuning methods, e.g., LoRA [23] or Adapter [22], to replace directly tuning the selected sensitive parameters (unstructured tuning) under the budget. Extensive experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods and largely boosts their performance, e.g., SPT improves Adapter with supervised pre-trained ViT-B/16 backbone by 4.2% and 1.4% mean Top-1 accuracy, reaching SOTA performance on FGVC and VTAB-1k benchmarks, respectively. Source code is at https://github.com/ziplab/SPT
△ Less
Submitted 31 August, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Interlacing Polynomial Method for the Column Subset Selection Problem
Authors:
Jian-Feng Cai,
Zhiqiang Xu,
Zili Xu
Abstract:
This paper investigates the spectral norm version of the column subset selection problem. Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a positive integer $k\leq\text{rank}(\mathbf{A})$, the objective is to select exactly $k$ columns of $\mathbf{A}$ that minimize the spectral norm of the residual matrix after projecting $\mathbf{A}$ onto the space spanned by the selected columns. We use…
▽ More
This paper investigates the spectral norm version of the column subset selection problem. Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a positive integer $k\leq\text{rank}(\mathbf{A})$, the objective is to select exactly $k$ columns of $\mathbf{A}$ that minimize the spectral norm of the residual matrix after projecting $\mathbf{A}$ onto the space spanned by the selected columns. We use the method of interlacing polynomials introduced by Marcus-Spielman-Srivastava to derive a new upper bound on the minimal approximation error. This new bound is asymptotically sharp when the matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ obeys a spectral power-law decay. The relevant expected characteristic polynomials can be written as an extension of the expected polynomial for the restricted invertibility problem, incorporating two extra variable substitution operators. Finally, we propose a deterministic polynomial-time algorithm that achieves this error bound up to a computational error.
△ Less
Submitted 7 January, 2024; v1 submitted 14 March, 2023;
originally announced March 2023.
-
Reliability-Adaptive Consistency Regularization for Weakly-Supervised Point Cloud Segmentation
Authors:
Zhonghua Wu,
Yicheng Wu,
Guosheng Lin,
Jianfei Cai
Abstract:
Weakly-supervised point cloud segmentation with extremely limited labels is highly desirable to alleviate the expensive costs of collecting densely annotated 3D points. This paper explores applying the consistency regularization that is commonly used in weakly-supervised learning, for its point cloud counterpart with multiple data-specific augmentations, which has not been well studied. We observe…
▽ More
Weakly-supervised point cloud segmentation with extremely limited labels is highly desirable to alleviate the expensive costs of collecting densely annotated 3D points. This paper explores applying the consistency regularization that is commonly used in weakly-supervised learning, for its point cloud counterpart with multiple data-specific augmentations, which has not been well studied. We observe that the straightforward way of applying consistency constraints to weakly-supervised point cloud segmentation has two major limitations: noisy pseudo labels due to the conventional confidence-based selection and insufficient consistency constraints due to discarding unreliable pseudo labels. Therefore, we propose a novel Reliability-Adaptive Consistency Network (RAC-Net) to use both prediction confidence and model uncertainty to measure the reliability of pseudo labels and apply consistency training on all unlabeled points while with different consistency constraints for different points based on the reliability of corresponding pseudo labels. Experimental results on the S3DIS and ScanNet-v2 benchmark datasets show that our model achieves superior performance in weakly-supervised point cloud segmentation. The code will be released publicly at https://github.com/wu-zhonghua/RAC-Net.
△ Less
Submitted 14 December, 2023; v1 submitted 9 March, 2023;
originally announced March 2023.
-
Exploring Efficient-Tuned Learning Audio Representation Method from BriVL
Authors:
Sen Fang,
Yangjian Wu,
Bowen Gao,
**gwen Cai,
Teik Toe Teoh
Abstract:
Recently, researchers have gradually realized that in some cases, the self-supervised pre-training on large-scale Internet data is better than that of high-quality/manually labeled data sets, and multimodal/large models are better than single or bimodal/small models. In this paper, we propose a robust audio representation learning method WavBriVL based on Bridging-Vision-and-Language (BriVL). WavB…
▽ More
Recently, researchers have gradually realized that in some cases, the self-supervised pre-training on large-scale Internet data is better than that of high-quality/manually labeled data sets, and multimodal/large models are better than single or bimodal/small models. In this paper, we propose a robust audio representation learning method WavBriVL based on Bridging-Vision-and-Language (BriVL). WavBriVL projects audio, image and text into a shared embedded space, so that multi-modal applications can be realized. We demonstrate the qualitative evaluation of the image generated from WavBriVL as a shared embedded space, with the main purposes of this paper:(1) Learning the correlation between audio and image;(2) Explore a new way of image generation, that is, use audio to generate pictures. Experimental results show that this method can effectively generate appropriate images from audio.
△ Less
Submitted 28 July, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Properties of Position Matrices and Their Elections
Authors:
Niclas Boehmer,
**-Yi Cai,
Piotr Faliszewski,
Austen Z. Fan,
Łukasz Janeczko,
Andrzej Kaczmarczyk,
Tomasz Wąs
Abstract:
We study the properties of elections that have a given position matrix (in such elections each candidate is ranked on each position by a number of voters specified in the matrix). We show that counting elections that generate a given position matrix is #P-complete. Consequently, sampling such elections uniformly at random seems challenging and we propose a simpler algorithm, without hard guarantee…
▽ More
We study the properties of elections that have a given position matrix (in such elections each candidate is ranked on each position by a number of voters specified in the matrix). We show that counting elections that generate a given position matrix is #P-complete. Consequently, sampling such elections uniformly at random seems challenging and we propose a simpler algorithm, without hard guarantees. Next, we consider the problem of testing if a given matrix can be implemented by an election with a certain structure (such as single-peakedness or group-separability). Finally, we consider the problem of checking if a given position matrix can be implemented by an election with a Condorcet winner. We complement our theoretical findings with experiments.
△ Less
Submitted 9 March, 2023; v1 submitted 4 March, 2023;
originally announced March 2023.
-
Knowledge Graph Completion with Counterfactual Augmentation
Authors:
Heng Chang,
Jie Cai,
Jia Li
Abstract:
Graph Neural Networks (GNNs) have demonstrated great success in Knowledge Graph Completion (KGC) by modeling how entities and relations interact in recent years. However, most of them are designed to learn from the observed graph structure, which appears to have imbalanced relation distribution during the training stage. Motivated by the causal relationship among the entities on a knowledge graph,…
▽ More
Graph Neural Networks (GNNs) have demonstrated great success in Knowledge Graph Completion (KGC) by modeling how entities and relations interact in recent years. However, most of them are designed to learn from the observed graph structure, which appears to have imbalanced relation distribution during the training stage. Motivated by the causal relationship among the entities on a knowledge graph, we explore this defect through a counterfactual question: "would the relation still exist if the neighborhood of entities became different from observation?". With a carefully designed instantiation of a causal model on the knowledge graph, we generate the counterfactual relations to answer the question by regarding the representations of entity pair given relation as context, structural information of relation-aware neighborhood as treatment, and validity of the composed triplet as the outcome. Furthermore, we incorporate the created counterfactual relations with the GNN-based framework on KGs to augment their learning of entity pair representations from both the observed and counterfactual relations. Experiments on benchmarks show that our proposed method outperforms existing methods on the task of KGC, achieving new state-of-the-art results. Moreover, we demonstrate that the proposed counterfactual relations-based augmentation also enhances the interpretability of the GNN-based framework through the path interpretations of predictions.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
Dependency Dialogue Acts -- Annotation Scheme and Case Study
Authors:
Jon Z. Cai,
Brendan King,
Margaret Perkoff,
Shiran Dudy,
Jie Cao,
Marie Grace,
Natalia Wojarnik,
Ananya Ganesh,
James H. Martin,
Martha Palmer,
Marilyn Walker,
Jeffrey Flanigan
Abstract:
In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse,…
▽ More
In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse, and response structure in multi-party multi-threaded conversations. A few key features distinguish DDA from existing dialogue annotation frameworks such as SWBD-DAMSL and the ISO 24617-2 standard. First, DDA prioritizes the relational structure of the dialogue units and the dialog context, annotating both dialog acts and rhetorical relations as response relations to particular utterances. Second, DDA embraces overloading in dialogues, encouraging annotators to specify multiple response relations and dialog acts for each dialog unit. Lastly, DDA places an emphasis on adequately capturing how a speaker is using the full dialog context to plan and organize their speech. With these features, DDA is highly expressive and recall-oriented with regard to conversation dynamics between multiple speakers. In what follows, we present the DDA annotation framework and case studies annotating DDA structures in multi-party, multi-threaded conversations.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
The Study of Circumgalactic Medium with Quasar Pairs
Authors:
Zhi-Fu Chen,
Huan-Chang Qin,
**-Ting Cai,
Yu-Tao Zhou,
Zhe-Geng Chen,
Ting-Ting Pang,
Zhi-Wen Wang
Abstract:
We have collected 10025 foreground-background quasar pairs with projected distances $d_p<500$ kpc from the large quasar catalog of the SDSS DR16Q. We investigate the properties of the Mg II absorption lines with $W_r>0.15$ Å around foreground quasars, including both the LOS (line-of-sights of foreground quasars) and transverse (TRA, perpendicular to the LOS) absorptions. Both the equivalent width…
▽ More
We have collected 10025 foreground-background quasar pairs with projected distances $d_p<500$ kpc from the large quasar catalog of the SDSS DR16Q. We investigate the properties of the Mg II absorption lines with $W_r>0.15$ Å around foreground quasars, including both the LOS (line-of-sights of foreground quasars) and transverse (TRA, perpendicular to the LOS) absorptions. Both the equivalent width (the correlation coefficient $ρ=-0.915$ and the probability $P < 10^{-4}$ of no correlation) and incident rate ($ρ=-0.964$ and $P < 10^{-6}$) of TRA \Mgii\ absorption lines are obviously anti-correlated with projected distance. The incident rate of TRA \Mgii\ absorption lines is obviously ($>4σ$) greater than that of LOS \Mgii\ absorption lines at projected distances $d_p<200$ kpc, while the TRA and LOS \Mgii\ both have similar ($<3σ$) incident rates at scales $d_p>200$ kpc. The anisotropic radiation from quasars would be the most possible interpretation for the anisotropic absorption around quasars. This could also indicate that the quasar radiation is not obviously impacting the gas halos of quasars at scales $d_p>200$ kpc.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
KG-ECO: Knowledge Graph Enhanced Entity Correction for Query Rewriting
Authors:
**glun Cai,
Mingda Li,
Ziyan Jiang,
Eunah Cho,
Zheng Chen,
Yang Liu,
Xing Fan,
Chenlei Guo
Abstract:
Query Rewriting (QR) plays a critical role in large-scale dialogue systems for reducing frictions. When there is an entity error, it imposes extra challenges for a dialogue system to produce satisfactory responses. In this work, we propose KG-ECO: Knowledge Graph enhanced Entity COrrection for query rewriting, an entity correction system with corrupt entity span detection and entity retrieval/re-r…
▽ More
Query Rewriting (QR) plays a critical role in large-scale dialogue systems for reducing frictions. When there is an entity error, it imposes extra challenges for a dialogue system to produce satisfactory responses. In this work, we propose KG-ECO: Knowledge Graph enhanced Entity COrrection for query rewriting, an entity correction system with corrupt entity span detection and entity retrieval/re-ranking functionalities. To boost the model performance, we incorporate Knowledge Graph (KG) to provide entity structural information (neighboring entities encoded by graph neural networks) and textual information (KG entity descriptions encoded by RoBERTa). Experimental results show that our approach yields a clear performance gain over two baselines: utterance level QR and entity correction without utilizing KG information. The proposed system is particularly effective for few-shot learning cases where target entities are rarely seen in training or there is a KG relation between the target entity and other contextual entities in the query.
△ Less
Submitted 22 February, 2023; v1 submitted 21 February, 2023;
originally announced February 2023.
-
HTNet: Human Topology Aware Network for 3D Human Pose Estimation
Authors:
Jialun Cai,
Hong Liu,
Runwei Ding,
Wenhao Li,
Jianbing Wu,
Miaoju Ban
Abstract:
3D human pose estimation errors would propagate along the human body topology and accumulate at the end joints of limbs. Inspired by the backtracking mechanism in automatic control systems, we design an Intra-Part Constraint module that utilizes the parent nodes as the reference to build topological constraints for end joints at the part level. Further considering the hierarchy of the human topolo…
▽ More
3D human pose estimation errors would propagate along the human body topology and accumulate at the end joints of limbs. Inspired by the backtracking mechanism in automatic control systems, we design an Intra-Part Constraint module that utilizes the parent nodes as the reference to build topological constraints for end joints at the part level. Further considering the hierarchy of the human topology, joint-level and body-level dependencies are captured via graph convolutional networks and self-attentions, respectively. Based on these designs, we propose a novel Human Topology aware Network (HTNet), which adopts a channel-split progressive strategy to sequentially learn the structural priors of the human topology from multiple semantic levels: joint, part, and body. Extensive experiments show that the proposed method improves the estimation accuracy by 18.7% on the end joints of limbs and achieves state-of-the-art results on Human3.6M and MPI-INF-3DHP datasets. Code is available at https://github.com/vefalun/HTNet.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
The complexity of counting planar graph homomorphisms of domain size 3
Authors:
**-Yi Cai,
Ashwin Maran
Abstract:
We prove a complexity dichotomy theorem for counting planar graph homomorphisms of domain size 3. Given any 3 by 3 real valued symmetric matrix $H$ defining a graph homomorphism from all planar graphs $G \mapsto Z_H(G)$, we completely classify the computational complexity of this problem according to the matrix $H$. We show that for every $H$, the problem is either polynomial time computable or \#…
▽ More
We prove a complexity dichotomy theorem for counting planar graph homomorphisms of domain size 3. Given any 3 by 3 real valued symmetric matrix $H$ defining a graph homomorphism from all planar graphs $G \mapsto Z_H(G)$, we completely classify the computational complexity of this problem according to the matrix $H$. We show that for every $H$, the problem is either polynomial time computable or \#P-hard. The P-time computable cases consist of precisely those that are P-time computable for general graphs (a complete classification is known) or computable by Valiant's holographic algorithm via matchgates. We also prove several results about planar graph homomorphisms for general domain size $q$. The proof uses mainly analytic arguments.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Bayesian-based hybrid method for rapid optimization of NV center sensors
Authors:
Jiazhao Tian,
Ressa S. Said,
Fedor Jelezko,
Jianming Cai,
Liantuan Xiao
Abstract:
NV center is one of the most promising platforms in the field of quantum sensing. Magnetometry based on NV center, especially, has achieved a concrete development in regions of biomedicine and medical diagnostics. Improving the sensitivity of NV center sensor under wide inhomogeneous broadening and filed amplitude drift is one crucial issue of continuous concern, which relies on the coherent contr…
▽ More
NV center is one of the most promising platforms in the field of quantum sensing. Magnetometry based on NV center, especially, has achieved a concrete development in regions of biomedicine and medical diagnostics. Improving the sensitivity of NV center sensor under wide inhomogeneous broadening and filed amplitude drift is one crucial issue of continuous concern, which relies on the coherent control of NV center with higher average fidelity. Quantum optimal control (QOC) methods provide access to this target, nevertheless the high time consumption of current methods due to the large number of needful sample points as well as the complexity of the parameter space has hindered their usability. In this paper we propose the Bayesian estimation phase-modulated (B-PM) method to tackle this problem. In the case of state transforming of NV center ensemble, the B-PM method reduces the time consumption by more than $90\%$ compared to the conventional standard Fourier base (SFB) method while increasing the average fidelity from $0.894$ to $0.905$. In AC magnetometry scenery, the optimized control pulse given by B-PM method achieves a eight-fold extension of the coherence time $T_2$ compared to rectangular $π$ pulse. Similar application can be made in other sensing situations. As a general algorithm, the B-PM method can be further extended to open- and closed-loop optimization of complex systems based on a variety of quantum platforms.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Real-time adaptive sensing of nuclear spins by a single-spin quantum sensor
Authors:
**gcheng Wang,
Dongxiao Li,
Ralf Betzholz,
Jianming Cai
Abstract:
Quantum sensing is considered to be one of the most promising subfields of quantum information to deliver practical quantum advantages in real-world applications. However, its impressive capabilities, including high sensitivity, are often hindered by the limited quantum resources available. Here, we incorporate the expected information gain (EIG) and techniques such as accelerated computation into…
▽ More
Quantum sensing is considered to be one of the most promising subfields of quantum information to deliver practical quantum advantages in real-world applications. However, its impressive capabilities, including high sensitivity, are often hindered by the limited quantum resources available. Here, we incorporate the expected information gain (EIG) and techniques such as accelerated computation into Bayesian experimental design (BED) in order to use quantum resources more efficiently. A simulated nitrogen-vacancy center in diamond is used to demonstrate real-time operation of the BED. Instead of heuristics, the EIG is used to choose optimal control parameters in real-time. Moreover, combining the BED with accelerated computation and asynchronous operations, we find that up to a tenfold speed-up in absolute time cost can be achieved in sensing multiple surrounding C13 nuclear spins. Our work explores the possibilities of applying the EIG to BED-based quantum-sensing tasks and provides techniques useful to integrate BED into more generalized quantum sensing systems.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Robustness of random-control quantum-state tomography
Authors:
**gcheng Wang,
Shaoliang Zhang,
Jianming Cai,
Zhenyu Liao,
Christian Arenz,
Ralf Betzholz
Abstract:
In a recently demonstrated quantum-state tomography scheme [Phys. Rev. Lett. 124, 010405 (2020)], a random control field is locally applied to a multipartite system to reconstruct the full quantum state of the system through single-observable measurements. Here, we analyze the robustness of such a tomography scheme against measurement errors. We characterize the sensitivity to measurement errors u…
▽ More
In a recently demonstrated quantum-state tomography scheme [Phys. Rev. Lett. 124, 010405 (2020)], a random control field is locally applied to a multipartite system to reconstruct the full quantum state of the system through single-observable measurements. Here, we analyze the robustness of such a tomography scheme against measurement errors. We characterize the sensitivity to measurement errors using the logarithm of the condition number of a linear system that fully describes the tomography process. Using results from random matrix theory we derive the scaling law of the logarithm of this condition number with respect to the system size when Haar-random evolutions are considered. While this expression is independent on how Haar randomness is created, we also perform numerical simulations to investigate the temporal behavior of the robustness for two specific quantum systems that are driven by a single random control field. Interestingly, we find that before the mean value of the logarithm of the condition number as a function of the driving time asymptotically approaches the value predicted for a Haar-random evolution, it reaches a plateau whose length increases with the system size.
△ Less
Submitted 10 August, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
Stitchable Neural Networks
Authors:
Zizheng Pan,
Jianfei Cai,
Bohan Zhuang
Abstract:
The public model zoo containing enormous powerful pretrained model families (e.g., ResNet/DeiT) has reached an unprecedented scope than ever, which significantly contributes to the success of deep learning. As each model family consists of pretrained models with diverse scales (e.g., DeiT-Ti/S/B), it naturally arises a fundamental question of how to efficiently assemble these readily available mod…
▽ More
The public model zoo containing enormous powerful pretrained model families (e.g., ResNet/DeiT) has reached an unprecedented scope than ever, which significantly contributes to the success of deep learning. As each model family consists of pretrained models with diverse scales (e.g., DeiT-Ti/S/B), it naturally arises a fundamental question of how to efficiently assemble these readily available models in a family for dynamic accuracy-efficiency trade-offs at runtime. To this end, we present Stitchable Neural Networks (SN-Net), a novel scalable and efficient framework for model deployment. It cheaply produces numerous networks with different complexity and performance trade-offs given a family of pretrained neural networks, which we call anchors. Specifically, SN-Net splits the anchors across the blocks/layers and then stitches them together with simple stitching layers to map the activations from one anchor to another. With only a few epochs of training, SN-Net effectively interpolates between the performance of anchors with varying scales. At runtime, SN-Net can instantly adapt to dynamic resource constraints by switching the stitching positions. Extensive experiments on ImageNet classification demonstrate that SN-Net can obtain on-par or even better performance than many individually trained networks while supporting diverse deployment scenarios. For example, by stitching Swin Transformers, we challenge hundreds of models in Timm model zoo with a single network. We believe this new elastic model framework can serve as a strong baseline for further research in wider communities.
△ Less
Submitted 28 March, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Deep Orthogonal Hypersphere Compression for Anomaly Detection
Authors:
Yunhe Zhang,
Yan Sun,
**yu Cai,
Jicong Fan
Abstract:
Many well-known and effective anomaly detection methods assume that a reasonable decision boundary has a hypersphere shape, which however is difficult to obtain in practice and is not sufficiently compact, especially when the data are in high-dimensional spaces. In this paper, we first propose a novel deep anomaly detection model that improves the original hypersphere learning through an orthogona…
▽ More
Many well-known and effective anomaly detection methods assume that a reasonable decision boundary has a hypersphere shape, which however is difficult to obtain in practice and is not sufficiently compact, especially when the data are in high-dimensional spaces. In this paper, we first propose a novel deep anomaly detection model that improves the original hypersphere learning through an orthogonal projection layer, which ensures that the training data distribution is consistent with the hypersphere hypothesis, thereby increasing the true positive rate and decreasing the false negative rate. Moreover, we propose a bi-hypersphere compression method to obtain a hyperspherical shell that yields a more compact decision region than a hyperball, which is demonstrated theoretically and numerically. The proposed methods are not confined to common datasets such as image and tabular data, but are also extended to a more challenging but promising scenario, graph-level anomaly detection, which learns graph representation with maximum mutual information between the substructure and global structure features while exploring orthogonal single- or bi-hypersphere anomaly decision boundaries. The numerical and visualization results on benchmark datasets demonstrate the superiority of our methods in comparison to many baselines and state-of-the-art methods.
△ Less
Submitted 4 May, 2024; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Vector Quantized Wasserstein Auto-Encoder
Authors:
Tung-Long Vuong,
Trung Le,
He Zhao,
Chuanxia Zheng,
Mehrtash Harandi,
Jianfei Cai,
Dinh Phung
Abstract:
Learning deep discrete latent presentations offers a promise of better symbolic and summarized abstractions that are more useful to subsequent downstream tasks. Inspired by the seminal Vector Quantized Variational Auto-Encoder (VQ-VAE), most of work in learning deep discrete representations has mainly focused on improving the original VQ-VAE form and none of them has studied learning deep discrete…
▽ More
Learning deep discrete latent presentations offers a promise of better symbolic and summarized abstractions that are more useful to subsequent downstream tasks. Inspired by the seminal Vector Quantized Variational Auto-Encoder (VQ-VAE), most of work in learning deep discrete representations has mainly focused on improving the original VQ-VAE form and none of them has studied learning deep discrete representations from the generative viewpoint. In this work, we study learning deep discrete representations from the generative viewpoint. Specifically, we endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution via minimizing a WS distance between them. We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution. Finally, we empirically evaluate our method on several well-known benchmarks, where it achieves better qualitative and quantitative performances than the other VQ-VAE variants in terms of the codebook utilization and image reconstruction/generation.
△ Less
Submitted 17 June, 2023; v1 submitted 12 February, 2023;
originally announced February 2023.
-
Deep Graph-Level Clustering Using Pseudo-Label-Guided Mutual Information Maximization Network
Authors:
**yu Cai,
Yi Han,
Wenzhong Guo,
Jicong Fan
Abstract:
In this work, we study the problem of partitioning a set of graphs into different groups such that the graphs in the same group are similar while the graphs in different groups are dissimilar. This problem was rarely studied previously, although there have been a lot of work on node clustering and graph classification. The problem is challenging because it is difficult to measure the similarity or…
▽ More
In this work, we study the problem of partitioning a set of graphs into different groups such that the graphs in the same group are similar while the graphs in different groups are dissimilar. This problem was rarely studied previously, although there have been a lot of work on node clustering and graph classification. The problem is challenging because it is difficult to measure the similarity or distance between graphs. One feasible approach is using graph kernels to compute a similarity matrix for the graphs and then performing spectral clustering, but the effectiveness of existing graph kernels in measuring the similarity between graphs is very limited. To solve the problem, we propose a novel method called Deep Graph-Level Clustering (DGLC). DGLC utilizes a graph isomorphism network to learn graph-level representations by maximizing the mutual information between the representations of entire graphs and substructures, under the regularization of a clustering module that ensures discriminative representations via pseudo labels. DGLC achieves graph-level representation learning and graph-level clustering in an end-to-end manner. The experimental results on six benchmark datasets of graphs show that our DGLC has state-of-the-art performance in comparison to many baselines.
△ Less
Submitted 5 February, 2023;
originally announced February 2023.
-
Strong quantum metrological limit from many-body physics
Authors:
Yaoming Chu,
Xiangbei Li,
Jianming Cai
Abstract:
Surpassing the standard quantum limit and even reaching the Heisenberg limit using quantum entanglement, represents the Holy Grail of quantum metrology. However, quantum entanglement is a valuable resource that does not come without a price. The exceptional time overhead for the preparation of large-scale entangled states raises disconcerting concerns about whether the Heisenberg limit is fundamen…
▽ More
Surpassing the standard quantum limit and even reaching the Heisenberg limit using quantum entanglement, represents the Holy Grail of quantum metrology. However, quantum entanglement is a valuable resource that does not come without a price. The exceptional time overhead for the preparation of large-scale entangled states raises disconcerting concerns about whether the Heisenberg limit is fundamentally achievable. Here we find a universal speed limit set by the Lieb-Robinson light cone for the quantum Fisher information growth to characterize the metrological potential of quantum resource states during their preparation. Our main result establishes a strong precision limit of quantum metrology accounting for the complexity of many-body quantum resource state preparation and reveals a fundamental constraint for reaching the Heisenberg limit in a generic many-body lattice system with bounded one-site energy. It enables us to identify the essential features of quantum many-body systems that are crucial for achieving the quantum advantage of quantum metrology, and brings an interesting connection between many-body quantum dynamics and quantum metrology.
△ Less
Submitted 11 April, 2023; v1 submitted 28 January, 2023;
originally announced January 2023.
-
Getting Away with More Network Pruning: From Sparsity to Geometry and Linear Regions
Authors:
Junyang Cai,
Khai-Nguyen Nguyen,
Nishant Shrestha,
Aidan Good,
Ruisen Tu,
Xin Yu,
Shandian Zhe,
Thiago Serra
Abstract:
One surprising trait of neural networks is the extent to which their connections can be pruned with little to no effect on accuracy. But when we cross a critical level of parameter sparsity, pruning any further leads to a sudden drop in accuracy. This drop plausibly reflects a loss in model complexity, which we aim to avoid. In this work, we explore how sparsity also affects the geometry of the li…
▽ More
One surprising trait of neural networks is the extent to which their connections can be pruned with little to no effect on accuracy. But when we cross a critical level of parameter sparsity, pruning any further leads to a sudden drop in accuracy. This drop plausibly reflects a loss in model complexity, which we aim to avoid. In this work, we explore how sparsity also affects the geometry of the linear regions defined by a neural network, and consequently reduces the expected maximum number of linear regions based on the architecture. We observe that pruning affects accuracy similarly to how sparsity affects the number of linear regions and our proposed bound for the maximum number. Conversely, we find out that selecting the sparsity across layers to maximize our bound very often improves accuracy in comparison to pruning as much with the same sparsity in all layers, thereby providing us guidance on where to prune.
△ Less
Submitted 19 January, 2023;
originally announced January 2023.
-
Class Enhancement Losses with Pseudo Labels for Zero-shot Semantic Segmentation
Authors:
Son Duy Dao,
Hengcan Shi,
Dinh Phung,
Jianfei Cai
Abstract:
Recent mask proposal models have significantly improved the performance of zero-shot semantic segmentation. However, the use of a `background' embedding during training in these methods is problematic as the resulting model tends to over-learn and assign all unseen classes as the background class instead of their correct labels. Furthermore, they ignore the semantic relationship of text embeddings…
▽ More
Recent mask proposal models have significantly improved the performance of zero-shot semantic segmentation. However, the use of a `background' embedding during training in these methods is problematic as the resulting model tends to over-learn and assign all unseen classes as the background class instead of their correct labels. Furthermore, they ignore the semantic relationship of text embeddings, which arguably can be highly informative for zero-shot prediction as seen classes may have close relationship with unseen classes. To this end, this paper proposes novel class enhancement losses to bypass the use of the background embbedding during training, and simultaneously exploit the semantic relationship between text embeddings and mask proposals by ranking the similarity scores. To further capture the relationship between seen and unseen classes, we propose an effective pseudo label generation pipeline using pretrained vision-language model. Extensive experiments on several benchmark datasets show that our method achieves overall the best performance for zero-shot semantic segmentation. Our method is flexible, and can also be applied to the challenging open-vocabulary semantic segmentation problem.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.