Search | arXiv e-print repository

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

Authors: Haoning Wu, Zicheng Zhang, Weixia Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Yixuan Gao, Annan Wang, Erli Zhang, Wenxiu Sun, Qiong Yan, Xiongkuo Min, Guangtao Zhai, Weisi Lin

Abstract: The explosion of visual content available online underscores the requirement for an accurate machine assessor to robustly evaluate scores across diverse types of visual contents. While recent studies have demonstrated the exceptional potentials of large multi-modality models (LMMs) on a wide range of related fields, in this work, we explore how to teach them for visual rating aligned with human op… ▽ More The explosion of visual content available online underscores the requirement for an accurate machine assessor to robustly evaluate scores across diverse types of visual contents. While recent studies have demonstrated the exceptional potentials of large multi-modality models (LMMs) on a wide range of related fields, in this work, we explore how to teach them for visual rating aligned with human opinions. Observing that human raters only learn and judge discrete text-defined levels in subjective studies, we propose to emulate this subjective process and teach LMMs with text-defined rating levels instead of scores. The proposed Q-Align achieves state-of-the-art performance on image quality assessment (IQA), image aesthetic assessment (IAA), as well as video quality assessment (VQA) tasks under the original LMM structure. With the syllabus, we further unify the three tasks into one model, termed the OneAlign. In our experiments, we demonstrate the advantage of the discrete-level-based syllabus over direct-score-based variants for LMMs. Our code and the pre-trained weights are released at https://github.com/Q-Future/Q-Align. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: Technical Report

arXiv:2312.15632 [pdf, other]

doi 10.1103/PhysRevLett.132.152502

Searching for Two-Neutrino and Neutrinoless Double Beta Decay of $^{134}$Xe with the PandaX-4T Experiment

Authors: PandaX Collaboration, Xiyu Yan, Zhaokan Cheng, Abdusalam Abdukerim, Zihao Bo, Wei Chen, Xun Chen, Chen Cheng, Xiangyi Cui, Yingjie Fan, Deqing Fang, Changbo Fu, Mengting Fu, Lisheng Geng, Karl Giboni, Linhui Gu, Xuyuan Guo, Chencheng Han, Ke Han, Changda He, **rong He, Di Huang, Yanlin Huang, Junting Huang, Zhou Huang , et al. (72 additional authors not shown)

Abstract: $^{134}$Xe is a candidate isotope for neutrinoless double beta decay~($0νββ$) search. In addition, the two-neutrino case ($2νββ$) allowed by the Standard Model of particle physics has not yet been observed. Utilizing the 10.4% of $^{134}$Xe in the natural xenon in the PandaX-4T detector and its first 94.9-day exposure, we have established the most stringent constraints on $2νββ$ and $0νββ$ of $^{1… ▽ More $^{134}$Xe is a candidate isotope for neutrinoless double beta decay~($0νββ$) search. In addition, the two-neutrino case ($2νββ$) allowed by the Standard Model of particle physics has not yet been observed. Utilizing the 10.4% of $^{134}$Xe in the natural xenon in the PandaX-4T detector and its first 94.9-day exposure, we have established the most stringent constraints on $2νββ$ and $0νββ$ of $^{134}$Xe half-lives, with limits of $2.8\times10^{22}$ yr and $3.0\times10^{23}$ yr at 90% confidence level, respectively. The $2νββ$ ($0νββ$) limit surpasses the previously reported best result by a factor of 32 (2.7), highlighting the potential of large monolithic natural xenon detectors. △ Less

Submitted 28 April, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

Journal ref: Phys.Rev.Lett. 132 (2024) 15, 152502

arXiv:2312.15354 [pdf, other]

Scout-Net: Prospective Personalized Estimation of CT Organ Doses from Scout Views

Authors: Abdullah-Al-Zubaer Imran, Sen Wang, Debashish Pal, Sandeep Dutta, Bhavik Patel, Evan Zucker, Adam Wang

Abstract: Purpose: Estimation of patient-specific organ doses is required for more comprehensive dose metrics, such as effective dose. Currently, available methods are performed retrospectively using the CT images themselves, which can only be done after the scan. To optimize CT acquisitions before scanning, rapid prediction of patient-specific organ dose is needed prospectively, using available scout image… ▽ More Purpose: Estimation of patient-specific organ doses is required for more comprehensive dose metrics, such as effective dose. Currently, available methods are performed retrospectively using the CT images themselves, which can only be done after the scan. To optimize CT acquisitions before scanning, rapid prediction of patient-specific organ dose is needed prospectively, using available scout images. We, therefore, devise an end-to-end, fully-automated deep learning solution to perform real-time, patient-specific, organ-level dosimetric estimation of CT scans. Approach: We propose the Scout-Net model for CT dose prediction at six different organs as well as for the overall patient body, leveraging the routinely obtained frontal and lateral scout images of patients, before their CT scans. To obtain reference values of the organ doses, we used Monte Carlo simulation and 3D segmentation methods on the corresponding CT images of the patients. Results: We validate our proposed Scout-Net model against real patient CT data and demonstrate the effectiveness in estimating organ doses in real-time (only 27 ms on average per scan). Additionally, we demonstrate the efficiency (real-time execution), sufficiency (reasonable error rates), and robustness (consistent across varying patient sizes) of the Scout-Net model. Conclusions: An effective, efficient, and robust Scout-Net model, once incorporated into the CT acquisition plan, could potentially guide the automatic exposure control for balanced image quality and radiation dose. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: 33 pages, 11 figures, 4 tables

arXiv:2312.15072 [pdf]

Engineering Plateau Phase Transition in Quantum Anomalous Hall Multilayers

Authors: Deyi Zhuo, Ling-Jie Zhou, Yi-Fan Zhao, Ruoxi Zhang, Zi-Jie Yan, Annie G. Wang, Moses H. W. Chan, Chao-Xing Liu, Chui-Zhen Chen, Cui-Zu Chang

Abstract: The plateau phase transition in quantum anomalous Hall (QAH) insulators corresponds to a quantum state wherein a single magnetic domain gives way to multiple magnetic domains and then re-converges back to a single magnetic domain. The layer structure of the sample provides an external knob for adjusting the Chern number C of the QAH insulators. Here, we employ molecular beam epitaxy (MBE) to grow… ▽ More The plateau phase transition in quantum anomalous Hall (QAH) insulators corresponds to a quantum state wherein a single magnetic domain gives way to multiple magnetic domains and then re-converges back to a single magnetic domain. The layer structure of the sample provides an external knob for adjusting the Chern number C of the QAH insulators. Here, we employ molecular beam epitaxy (MBE) to grow magnetic topological insulator (TI) multilayers with an asymmetric layer structure and realize the magnetic field-driven plateau phase transition between two QAH states with odd Chern number change ΔC. In multilayer structures with C=+-1 and C=+-2 QAH states, we find two characteristic power-law behaviors between temperature and the scaling variables on the magnetic field at transition points. The critical exponents extracted for the plateau phase transitions with ΔC=1 and ΔC=3 in QAH insulators are found to be nearly identical, specifically, k1~0.390+-0.021 and k2~0.388+-0.015, respectively. We construct a four-layer Chalker-Coddington network model to understand the consistent critical exponents for the plateau phase transitions with ΔC=1 and ΔC=3. This work will motivate further investigations into the critical behaviors of plateau phase transitions with different ΔC in QAH insulators and provide new opportunities for the development of QAH chiral edge current-based electronic and spintronic devices. △ Less

Submitted 14 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: 18 pages, 4 figures. Comments are welcome

arXiv:2312.14232 [pdf, other]

Parrot Captions Teach CLIP to Spot Text

Authors: Yiqi Lin, Conghui He, Alex **peng Wang, Bin Wang, Weijia Li, Mike Zheng Shou

Abstract: Despite CLIP being the foundation model in numerous vision-language applications, the CLIP suffers from a severe text spotting bias. Such bias causes CLIP models to `Parrot' the visual text embedded within images while disregarding the authentic visual semantics. We uncover that in the most popular image-text dataset LAION-2B, the captions also densely parrot (spell) the text embedded in images. O… ▽ More Despite CLIP being the foundation model in numerous vision-language applications, the CLIP suffers from a severe text spotting bias. Such bias causes CLIP models to `Parrot' the visual text embedded within images while disregarding the authentic visual semantics. We uncover that in the most popular image-text dataset LAION-2B, the captions also densely parrot (spell) the text embedded in images. Our analysis shows that around 50% of images are embedded with visual text content, and around 30% of captions words are in these embedded visual content. Based on such observation, we thoroughly inspect the different released versions of CLIP models and verify that the visual text is the dominant factor in measuring the LAION-style image-text similarity for these models. To examine whether these parrot captions shape the text spotting bias, we train a series of CLIP models with LAION subsets curated by different parrot-caption-oriented criteria. We show that training with parrot captions easily shapes such bias but harms the expected visual-language representation learning in CLIP models. This suggests that it is urgent to revisit either the design of CLIP-like models or the existing image-text dataset curation pipeline built on CLIP score filtering. △ Less

Submitted 1 February, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: project page: https://linyq17.github.io/CLIP-Parrot-Bias/. Add more analysis and ablation studies. Update Figure 3 with a more precise metric

arXiv:2312.12713 [pdf, other]

Response Enhanced Semi-supervised Dialogue Query Generation

Authors: Jianheng Huang, Ante Wang, Linfeng Gao, Linfeng Song, **song Su

Abstract: Leveraging vast and continually updated knowledge from the Internet has been considered an important ability for a dialogue system. Therefore, the dialogue query generation task is proposed for generating search queries from dialogue histories, which will be submitted to a search engine for retrieving relevant websites on the Internet. In this regard, previous efforts were devoted to collecting co… ▽ More Leveraging vast and continually updated knowledge from the Internet has been considered an important ability for a dialogue system. Therefore, the dialogue query generation task is proposed for generating search queries from dialogue histories, which will be submitted to a search engine for retrieving relevant websites on the Internet. In this regard, previous efforts were devoted to collecting conversations with annotated queries and training a query producer (QP) via standard supervised learning. However, these studies still face the challenges of data scarcity and domain adaptation. To address these issues, in this paper, we propose a semi-supervised learning framework -- SemiDQG, to improve model performance with unlabeled conversations. Based on the observation that the search query is typically related to the topic of dialogue response, we train a response-augmented query producer (RA) to provide rich and effective training signals for QP. We first apply a similarity-based query selection strategy to select high-quality RA-generated pseudo queries, which are used to construct pseudo instances for training QP and RA. Then, we adopt the REINFORCE algorithm to further enhance QP, with RA-provided rewards as fine-grained training signals. Experimental results and in-depth analysis of three benchmarks show the effectiveness of our framework in cross-domain and low-resource scenarios. Particularly, SemiDQG significantly surpasses ChatGPT and competitive baselines. Our code is available at \url{https://github.com/DeepLearnXMU/SemiDQG}. △ Less

Submitted 15 February, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: AAAI-24 main track paper

arXiv:2312.11689 [pdf, ps, other]

Weak Poincaré Inequalities for Markov chains: theory and applications

Authors: Christophe Andrieu, Anthony Lee, Sam Power, Andi Q. Wang

Abstract: We investigate the application of Weak Poincaré Inequalities (WPI) to Markov chains to study their rates of convergence and to derive complexity bounds. At a theoretical level we investigate the necessity of the existence of WPIs to ensure \mathrm{L}^{2}-convergence, in particular by establishing equivalence with the Resolvent Uniform Positivity-Improving (RUPI) condition and providing a counterex… ▽ More We investigate the application of Weak Poincaré Inequalities (WPI) to Markov chains to study their rates of convergence and to derive complexity bounds. At a theoretical level we investigate the necessity of the existence of WPIs to ensure \mathrm{L}^{2}-convergence, in particular by establishing equivalence with the Resolvent Uniform Positivity-Improving (RUPI) condition and providing a counterexample. From a more practical perspective, we extend the celebrated Cheeger's inequalities to the subgeometric setting, and further apply these techniques to study random-walk Metropolis algorithms for heavy-tailed target distributions and to obtain lower bounds on pseudo-marginal algorithms. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.11072 [pdf, other]

doi 10.1088/1674-1137/ad380f

Waveform Simulation in PandaX-4T

Authors: Jiafu Li, Abdusalam Abdukerim, Chen Cheng, Zihao Bo, Wei Chen, Xun Chen, Yunhua Chen, Zhaokan Cheng, Xiangyi Cui, Yingjie Fan, Deqing Fang, Changbo Fu, Mengting Fu, Lisheng Geng, Karl Giboni, Linhui Gu, Xuyuan Guo, Chencheng Han, Ke Han, Changda He, **rong He, Di Huang, Yanlin Huang, Zhou Huang, Ruquan Hou , et al. (66 additional authors not shown)

Abstract: Signal reconstruction through software processing is a crucial component of the background and signal models in the PandaX-4T experiment, which is a multi-tonne dark matter direct search experiment. The accuracy of signal reconstruction is influenced by various detector artifacts, including noise, dark count of photomultiplier, impurity photoionization in the detector, and other relevant considera… ▽ More Signal reconstruction through software processing is a crucial component of the background and signal models in the PandaX-4T experiment, which is a multi-tonne dark matter direct search experiment. The accuracy of signal reconstruction is influenced by various detector artifacts, including noise, dark count of photomultiplier, impurity photoionization in the detector, and other relevant considerations. In this study, we present a detailed description of a semi-data-driven approach designed to simulate the signal waveform. This work provides a reliable model for the efficiency and bias of the signal reconstruction in the data analysis of PandaX-4T. By comparing critical variables which relate to the temporal shape and hit pattern of the signals, we demonstrate a good agreement between the simulation and data. △ Less

Submitted 21 May, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Journal ref: Chin. Phys. C 48, no.7,073001 (2024)

arXiv:2312.09336 [pdf, other]

doi 10.1145/3576915.3623065

DECLASSIFLOW: A Static Analysis for Modeling Non-Speculative Knowledge to Relax Speculative Execution Security Measures (Full Version)

Authors: Rutvik Choudhary, Alan Wang, Zirui Neil Zhao, Adam Morrison, Christopher W. Fletcher

Abstract: Speculative execution attacks undermine the security of constant-time programming, the standard technique used to prevent microarchitectural side channels in security-sensitive software such as cryptographic code. Constant-time code must therefore also deploy a defense against speculative execution attacks to prevent leakage of secret data stored in memory or the processor registers. Unfortunately… ▽ More Speculative execution attacks undermine the security of constant-time programming, the standard technique used to prevent microarchitectural side channels in security-sensitive software such as cryptographic code. Constant-time code must therefore also deploy a defense against speculative execution attacks to prevent leakage of secret data stored in memory or the processor registers. Unfortunately, contemporary defenses, such as speculative load hardening (SLH), can only satisfy this strong security guarantee at a very high performance cost. This paper proposes DECLASSIFLOW, a static program analysis and protection framework to efficiently protect constant-time code from speculative leakage. DECLASSIFLOW models "attacker knowledge" -- data which is inherently transmitted (or, implicitly declassified) by the code's non-speculative execution -- and statically removes protection on such data from points in the program where it is already guaranteed to leak non-speculatively. Overall, DECLASSIFLOW ensures that data which never leaks during the non-speculative execution does not leak during speculative execution, but with lower overhead than conservative protections like SLH. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Journal ref: In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS '23). Association for Computing Machinery, New York, NY, USA, 2053-2067

arXiv:2312.07636 [pdf, other]

Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply

Authors: Chengting Yu, Fengzhao Zhang, Hanzhi Ma, Aili Wang, Er** Li

Abstract: Traditional end-to-end (E2E) training of deep networks necessitates storing intermediate activations for back-propagation, resulting in a large memory footprint on GPUs and restricted model parallelization. As an alternative, greedy local learning partitions the network into gradient-isolated modules and trains supervisely based on local preliminary losses, thereby providing asynchronous and paral… ▽ More Traditional end-to-end (E2E) training of deep networks necessitates storing intermediate activations for back-propagation, resulting in a large memory footprint on GPUs and restricted model parallelization. As an alternative, greedy local learning partitions the network into gradient-isolated modules and trains supervisely based on local preliminary losses, thereby providing asynchronous and parallel training methods that substantially reduce memory cost. However, empirical experiments reveal that as the number of segmentations of the gradient-isolated module increases, the performance of the local learning scheme degrades substantially, severely limiting its expansibility. To avoid this issue, we theoretically analyze the greedy local learning from the standpoint of information theory and propose a ContSup scheme, which incorporates context supply between isolated modules to compensate for information loss. Experiments on benchmark datasets (i.e. CIFAR, SVHN, STL-10) achieve SOTA results and indicate that our proposed method can significantly improve the performance of greedy local learning with minimal memory and computational overhead, allowing for the boost of the number of isolated modules. Our codes are available at https://github.com/Tab-ct/ContSup. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 9 figures, 12 tables

arXiv:2312.05832 [pdf, other]

Spatial-wise Dynamic Distillation for MLP-like Efficient Visual Fault Detection of Freight Trains

Authors: Yang Zhang, Huilin Pan, Mingying Li, An Wang, Yang Zhou, Hongliang Ren

Abstract: Despite the successful application of convolutional neural networks (CNNs) in object detection tasks, their efficiency in detecting faults from freight train images remains inadequate for implementation in real-world engineering scenarios. Existing modeling shortcomings of spatial invariance and pooling layers in conventional CNNs often ignore the neglect of crucial global information, resulting i… ▽ More Despite the successful application of convolutional neural networks (CNNs) in object detection tasks, their efficiency in detecting faults from freight train images remains inadequate for implementation in real-world engineering scenarios. Existing modeling shortcomings of spatial invariance and pooling layers in conventional CNNs often ignore the neglect of crucial global information, resulting in error localization for fault objection tasks of freight trains. To solve these problems, we design a spatial-wise dynamic distillation framework based on multi-layer perceptron (MLP) for visual fault detection of freight trains. We initially present the axial shift strategy, which allows the MLP-like architecture to overcome the challenge of spatial invariance and effectively incorporate both local and global cues. We propose a dynamic distillation method without a pre-training teacher, including a dynamic teacher mechanism that can effectively eliminate the semantic discrepancy with the student model. Such an approach mines more abundant details from lower-level feature appearances and higher-level label semantics as the extra supervision signal, which utilizes efficient instance embedding to model the global spatial and semantic information. In addition, the proposed dynamic teacher can jointly train with students to further enhance the distillation efficiency. Extensive experiments executed on six typical fault datasets reveal that our approach outperforms the current state-of-the-art detectors and achieves the highest accuracy with real-time detection at a lower computational cost. The source code will be available at \url{https://github.com/MVME-HBUT/SDD-FTI-FDet}. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 10 pages, 6 figures

arXiv:2312.05760 [pdf, other]

RepViT-SAM: Towards Real-Time Segmenting Anything

Authors: Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding

Abstract: Segment Anything Model (SAM) has shown impressive zero-shot transfer performance for various computer vision tasks recently. However, its heavy computation costs remain daunting for practical applications. MobileSAM proposes to replace the heavyweight image encoder in SAM with TinyViT by employing distillation, which results in a significant reduction in computational requirements. However, its de… ▽ More Segment Anything Model (SAM) has shown impressive zero-shot transfer performance for various computer vision tasks recently. However, its heavy computation costs remain daunting for practical applications. MobileSAM proposes to replace the heavyweight image encoder in SAM with TinyViT by employing distillation, which results in a significant reduction in computational requirements. However, its deployment on resource-constrained mobile devices still encounters challenges due to the substantial memory and computational overhead caused by self-attention mechanisms. Recently, RepViT achieves the state-of-the-art performance and latency trade-off on mobile devices by incorporating efficient architectural designs of ViTs into CNNs. Here, to achieve real-time segmenting anything on mobile devices, following MobileSAM, we replace the heavyweight image encoder in SAM with RepViT model, ending up with the RepViT-SAM model. Extensive experiments show that RepViT-SAM can enjoy significantly better zero-shot transfer capability than MobileSAM, along with nearly $10\times$ faster inference speed. The code and models are available at \url{https://github.com/THU-MIG/RepViT}. △ Less

Submitted 29 February, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

Comments: Technical report of RepViT+SAM in our CVPR 2024 work. Project page: https://jameslahm.github.io/repvit-sam/

arXiv:2312.03344 [pdf, other]

Interpretable Mechanistic Representations for Meal-level Glycemic Control in the Wild

Authors: Ke Alexander Wang, Emily B. Fox

Abstract: Diabetes encompasses a complex landscape of glycemic control that varies widely among individuals. However, current methods do not faithfully capture this variability at the meal level. On the one hand, expert-crafted features lack the flexibility of data-driven methods; on the other hand, learned representations tend to be uninterpretable which hampers clinical adoption. In this paper, we propose… ▽ More Diabetes encompasses a complex landscape of glycemic control that varies widely among individuals. However, current methods do not faithfully capture this variability at the meal level. On the one hand, expert-crafted features lack the flexibility of data-driven methods; on the other hand, learned representations tend to be uninterpretable which hampers clinical adoption. In this paper, we propose a hybrid variational autoencoder to learn interpretable representations of CGM and meal data. Our method grounds the latent space to the inputs of a mechanistic differential equation, producing embeddings that reflect physiological quantities, such as insulin sensitivity, glucose effectiveness, and basal glucose levels. Moreover, we introduce a novel method to infer the glucose appearance rate, making the mechanistic model robust to unreliable meal logs. On a dataset of CGM and self-reported meals from individuals with type-2 diabetes and pre-diabetes, our unsupervised representation discovers a separation between individuals proportional to their disease severity. Our embeddings produce clusters that are up to 4x better than naive, expert, black-box, and pure mechanistic features. Our method provides a nuanced, yet interpretable, embedding space to compare glycemic control within and across individuals, directly learnable from in-the-wild data. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: Proceedings of Machine Learning for Health (ML4H) 2023. Code available at: https://github.com/KeAWang/interpretable-cgm-representations

arXiv:2312.02030 [pdf, other]

doi 10.1103/PhysRevD.109.092003

First Constraints on WIMP-Nucleon Effective Field Theory Couplings in an Extended Energy Region From LUX-ZEPLIN

Authors: LZ Collaboration, J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, J. W. Bargemann, A. Baxter, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. Bishop, G. M. Blockinger , et al. (175 additional authors not shown)

Abstract: Following the first science results of the LUX-ZEPLIN (LZ) experiment, a dual-phase xenon time projection chamber operating from the Sanford Underground Research Facility in Lead, South Dakota, USA, we report the initial limits on a model-independent non-relativistic effective field theory describing the complete set of possible interactions of a weakly interacting massive particle (WIMP) with a n… ▽ More Following the first science results of the LUX-ZEPLIN (LZ) experiment, a dual-phase xenon time projection chamber operating from the Sanford Underground Research Facility in Lead, South Dakota, USA, we report the initial limits on a model-independent non-relativistic effective field theory describing the complete set of possible interactions of a weakly interacting massive particle (WIMP) with a nucleon. These results utilize the same 5.5 t fiducial mass and 60 live days of exposure collected for the LZ spin-independent and spin-dependent analyses while extending the upper limit of the energy region of interest by a factor of 7.5 to 270 keVnr. No significant excess in this high energy region is observed. Using a profile-likelihood ratio analysis, we report 90% confidence level exclusion limits on the coupling of each individual non-relativistic WIMP-nucleon operator for both elastic and inelastic interactions in the isoscalar and isovector bases. △ Less

Submitted 26 February, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: 17 pages 11 figures

Journal ref: Phys. Rev. D 109, 092003 (2024)

arXiv:2312.01947 [pdf, other]

doi 10.1103/PhysRevResearch.6.023098

Maximising Quantum-Computing Expressive Power through Randomised Circuits

Authors: Yingli Yang, Zongkang Zhang, Anbang Wang, Xiaosi Xu, Xiaoting Wang, Ying Li

Abstract: In the noisy intermediate-scale quantum era, variational quantum algorithms (VQAs) have emerged as a promising avenue to obtain quantum advantage. However, the success of VQAs depends on the expressive power of parameterised quantum circuits, which is constrained by the limited gate number and the presence of barren plateaus. In this work, we propose and numerically demonstrate a novel approach fo… ▽ More In the noisy intermediate-scale quantum era, variational quantum algorithms (VQAs) have emerged as a promising avenue to obtain quantum advantage. However, the success of VQAs depends on the expressive power of parameterised quantum circuits, which is constrained by the limited gate number and the presence of barren plateaus. In this work, we propose and numerically demonstrate a novel approach for VQAs, utilizing randomised quantum circuits to generate the variational wavefunction. We parameterize the distribution function of these random circuits using artificial neural networks and optimize it to find the solution. This random-circuit approach presents a trade-off between the expressive power of the variational wavefunction and time cost, in terms of the sampling cost of quantum circuits. Given a fixed gate number, we can systematically increase the expressive power by extending the quantum-computing time. With a sufficiently large permissible time cost, the variational wavefunction can approximate any quantum state with arbitrary accuracy. Furthermore, we establish explicit relationships between expressive power, time cost, and gate number for variational quantum eigensolvers. These results highlight the promising potential of the random-circuit approach in achieving a high expressive power in quantum computing. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: 19 pages, 10 figures

Journal ref: Phys. Rev. Research 6, 023098(2024)

arXiv:2312.01566 [pdf, other]

Coronary Atherosclerotic Plaque Characterization with Photon-counting CT: a Simulation-based Feasibility Study

Authors: Mengzhou Li, Mingye Wu, Jed Pack, Pengwei Wu, Bruno De Man, Adam Wang, Koen Nieman, Ge Wang

Abstract: Recent development of photon-counting CT (PCCT) brings great opportunities for plaque characterization with much-improved spatial resolution and spectral imaging capability. While existing coronary plaque PCCT imaging results are based on detectors made of CZT or CdTe materials, deep-silicon photon-counting detectors have unique performance characteristics and promise distinct imaging capabilities… ▽ More Recent development of photon-counting CT (PCCT) brings great opportunities for plaque characterization with much-improved spatial resolution and spectral imaging capability. While existing coronary plaque PCCT imaging results are based on detectors made of CZT or CdTe materials, deep-silicon photon-counting detectors have unique performance characteristics and promise distinct imaging capabilities. In this work, we report a systematic simulation study of a deep-silicon PCCT scanner with a new clinically-relevant digital plaque phantom with realistic geometrical parameters and chemical compositions. This work investigates the effects of spatial resolution, noise, motion artifacts, radiation dose, and spectral characterization. Our simulation results suggest that the deep-silicon PCCT design provides adequate spatial resolution for visualizing a necrotic core and quantitation of key plaque features. Advanced denoising techniques and aggressive bowtie filter designs can keep image noise to acceptable levels at this resolution while kee** radiation dose comparable to that of a conventional CT scan. The ultrahigh resolution of PCCT also means an elevated sensitivity to motion artifacts. It is found that a tolerance of less than 0.4 mm residual movement range requires the application of accurate motion correction methods for best plaque imaging quality with PCCT. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: 13 figures, 5 tables

arXiv:2312.01269 [pdf]

doi 10.1016/j.scib.2023.10.008

Room-temperature orbit-transfer torque enabling van der Waals magnetoresistive memories

Authors: Zhen-Cun Pan, Dong Li, Xing-Guo Ye, Zheng Chen, Zhao-Hui Chen, An-Qi Wang, Mingliang Tian, Guangjie Yao, Kaihui Liu, Zhi-Min Liao

Abstract: The nonvolatile magnetoresistive random access memory (MRAM) is believed to facilitate emerging applications, such as in memory computing, neuromorphic computing and stochastic computing. Two dimensional (2D) materials and their van der Waals heterostructures promote the development of MRAM technology, due to their atomically smooth interfaces and tunable physical properties. Here we report the al… ▽ More The nonvolatile magnetoresistive random access memory (MRAM) is believed to facilitate emerging applications, such as in memory computing, neuromorphic computing and stochastic computing. Two dimensional (2D) materials and their van der Waals heterostructures promote the development of MRAM technology, due to their atomically smooth interfaces and tunable physical properties. Here we report the all-2D magnetoresistive memories featuring all electrical data reading and writing at room temperature based on WTe2/Fe3GaTe2/BN/Fe3GaTe2 heterostructures. The data reading process relies on the tunnel magnetoresistance of Fe3GaTe2/BN/Fe3GaTe2. The data writing is achieved through current induced polarization of orbital magnetic moments in WTe2, which exert torques on Fe3GaTe2, known as the orbit transfer torque (OTT) effect. In contrast to the conventional reliance on spin moments in spin transfer torque and spin orbit torque, the OTT effect leverages the natural out of plane orbital moments, facilitating field-free perpendicular magnetization switching through interface currents. Our results indicate that the emerging OTT MRAM is promising for low power, high performance memory applications. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Journal ref: Science Bulletin 68, 2743 (2023)

arXiv:2312.01263 [pdf]

doi 10.1103/PhysRevLett.131.186302

Gate-Tunable Berry Curvature Dipole Polarizability in Dirac Semimetal Cd3As2

Authors: Tong-Yang Zhao, An-Qi Wang, Xing-Guo Ye, Xing-Yu Liu, Xin Liao, Zhi-Min Liao

Abstract: We reveal the gate-tunable Berry curvature dipole polarizability in Dirac semimetal Cd3As2 nanoplates through measurements of the third-order nonlinear Hall effect. Under an applied electric field, the Berry curvature exhibits an asymmetric distribution, forming a field-induced Berry curvature dipole, resulting in a measurable third-order Hall voltage with a cubic relationship to the longitudinal… ▽ More We reveal the gate-tunable Berry curvature dipole polarizability in Dirac semimetal Cd3As2 nanoplates through measurements of the third-order nonlinear Hall effect. Under an applied electric field, the Berry curvature exhibits an asymmetric distribution, forming a field-induced Berry curvature dipole, resulting in a measurable third-order Hall voltage with a cubic relationship to the longitudinal electric field. Notably, the magnitude and polarity of this third-order nonlinear Hall effect can be effectively modulated by gate voltages. Furthermore, our scaling relation analysis demonstrates that the sign of the Berry curvature dipole polarizability changes when tuning the Fermi level across the Dirac point, in agreement with theoretical calculations. The results highlight the gate control of nonlinear quantum transport in Dirac semimetals, paving the way for promising advancements in topological electronics. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Journal ref: Phys. Rev. Lett. 131, 186302 (2023)

arXiv:2312.01030 [pdf, ps, other]

On the Analytic Langlands Corrrespondence for $\operatorname{PGL}_2$ in Genus 0 with Wild Ramification

Authors: Daniil Klyuev, Atticus Wang

Abstract: The analytic Langlands correspondence was developed by Etingof, Frenkel and Kazhdan in arXiv:1908.09677, arXiv:2103.01509, arXiv:2106.05243, arXiv:2311.03743. For a curve $X$ and a group $G$ over a local field $F$, in the tamely ramified setting one considers the variety $\operatorname{Bun}_G$ of stable $G$-bundles on $X$ with Borel reduction at a finite subset $S\subset X$ of points. On one side… ▽ More The analytic Langlands correspondence was developed by Etingof, Frenkel and Kazhdan in arXiv:1908.09677, arXiv:2103.01509, arXiv:2106.05243, arXiv:2311.03743. For a curve $X$ and a group $G$ over a local field $F$, in the tamely ramified setting one considers the variety $\operatorname{Bun}_G$ of stable $G$-bundles on $X$ with Borel reduction at a finite subset $S\subset X$ of points. On one side of this conjectural correspondence there are Hecke operators on $L^2(\operatorname{Bun}_G)$, the Hilbert space of square-integrable half-densities on $\operatorname{Bun}_G$; on the other side there are certain opers with regular singularities at $S$. In this paper we prove the main conjectures of analytic Langlands correspondence in the case $G = \operatorname{PGL}_2$, $X=\mathbb{P}^1_{\mathbb{C}}$ with wild ramification, i.e. when several points in $S$ are collided together. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 15 pages

arXiv:2312.00164 [pdf, other]

Towards Accurate Differential Diagnosis with Large Language Models

Authors: Daniel McDuff, Mike Schaekermann, Tao Tu, Anil Palepu, Amy Wang, Jake Garrison, Karan Singhal, Yash Sharma, Shekoofeh Azizi, Kavita Kulkarni, Le Hou, Yong Cheng, Yun Liu, S Sara Mahdavi, Sushant Prakash, Anupam Pathak, Christopher Semturs, Shwetak Patel, Dale R Webster, Ewa Dominowska, Juraj Gottweis, Joelle Barral, Katherine Chou, Greg S Corrado, Yossi Matias , et al. (3 additional authors not shown)

Abstract: An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM op… ▽ More An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise. △ Less

Submitted 30 November, 2023; originally announced December 2023.

arXiv:2311.18761 [pdf, other]

Can training neural language models on a curriculum with developmentally plausible data improve alignment with human reading behavior?

Authors: Aryaman Chobey, Oliver Smith, Anzi Wang, Grusha Prasad

Abstract: The use of neural language models to model human behavior has met with mixed success. While some work has found that the surprisal estimates from these models can be used to predict a wide range of human neural and behavioral responses, other work studying more complex syntactic phenomena has found that these surprisal estimates generate incorrect behavioral predictions. This paper explores the ex… ▽ More The use of neural language models to model human behavior has met with mixed success. While some work has found that the surprisal estimates from these models can be used to predict a wide range of human neural and behavioral responses, other work studying more complex syntactic phenomena has found that these surprisal estimates generate incorrect behavioral predictions. This paper explores the extent to which the misalignment between empirical and model-predicted behavior can be minimized by training models on more developmentally plausible data, such as in the BabyLM Challenge. We trained teacher language models on the BabyLM "strict-small" dataset and used sentence level surprisal estimates from these teacher models to create a curriculum. We found tentative evidence that our curriculum made it easier for models to acquire linguistic knowledge from the training data: on the subset of tasks in the BabyLM challenge suite evaluating models' grammatical knowledge of English, models first trained on the BabyLM data curriculum and then on a few randomly ordered training epochs performed slightly better than models trained on randomly ordered epochs alone. This improved linguistic knowledge acquisition did not result in better alignment with human reading behavior, however: models trained on the BabyLM dataset (with or without a curriculum) generated predictions that were as misaligned with human behavior as models trained on larger less curated datasets. This suggests that training on developmentally plausible datasets alone is likely insufficient to generate language models capable of accurately predicting human language processing. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: To appear in the proceedings of BabyLM shared task CoNLL 2023

arXiv:2311.18148 [pdf]

A universal optical modulator for synthetic topologically tuneable structured matter

Authors: Chao He, Binguo Chen, Zipei Song, Zimo Zhao, Yifei Ma, Honghui He, Lin Luo, Tade Marozsak, An Wang, Rui Xu, Peixiang Huang, Xuke Qiu, Bangshan Sun, Jiahe Cui, Yuxi Cai, Yun Zhang, Patrick Salter, Julian AJ Fells, Ben Dai, Shaoxiong Liu, Limei Guo, Hui Ma, Steve J Elston, Qiwen Zhan, Chengwei Qiu , et al. (3 additional authors not shown)

Abstract: Topologically structured matter, such as metasurfaces and metamaterials, have given rise to impressive photonic functionality, fuelling diverse applications from microscopy and holography to encryption and communication. Presently these solutions are limited by their largely static nature and preset functionality, hindering applications that demand dynamic photonic systems with reconfigurable topo… ▽ More Topologically structured matter, such as metasurfaces and metamaterials, have given rise to impressive photonic functionality, fuelling diverse applications from microscopy and holography to encryption and communication. Presently these solutions are limited by their largely static nature and preset functionality, hindering applications that demand dynamic photonic systems with reconfigurable topologies. Here we demonstrate a universal optical modulator that implements topologically tuneable structured matter as virtual pixels derived from cascading low functionality tuneable devices, altering the paradigm of phase and amplitude control to encompass arbitrary spatially varying retarders in a synthetic structured matter device. Our approach opens unprecedented functionality that is user-defined with high flexibility, allowing our synthetic structured matter to act as an information carrier, beam generator, analyser, and corrector, opening an exciting path to tuneable topologies of light and matter. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.15657 [pdf, other]

Enhancing Diffusion Models with Text-Encoder Reinforcement Learning

Authors: Chaofeng Chen, Annan Wang, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, Weisi Lin

Abstract: Text-to-image diffusion models are typically trained to optimize the log-likelihood objective, which presents challenges in meeting specific requirements for downstream tasks, such as image aesthetics and image-text alignment. Recent research addresses this issue by refining the diffusion U-Net using human rewards through reinforcement learning or direct backpropagation. However, many of them over… ▽ More Text-to-image diffusion models are typically trained to optimize the log-likelihood objective, which presents challenges in meeting specific requirements for downstream tasks, such as image aesthetics and image-text alignment. Recent research addresses this issue by refining the diffusion U-Net using human rewards through reinforcement learning or direct backpropagation. However, many of them overlook the importance of the text encoder, which is typically pretrained and fixed during training. In this paper, we demonstrate that by finetuning the text encoder through reinforcement learning, we can enhance the text-image alignment of the results, thereby improving the visual quality. Our primary motivation comes from the observation that the current text encoder is suboptimal, often requiring careful prompt adjustment. While fine-tuning the U-Net can partially improve performance, it remains suffering from the suboptimal text encoder. Therefore, we propose to use reinforcement learning with low-rank adaptation to finetune the text encoder based on task-specific rewards, referred as \textbf{TexForce}. We first show that finetuning the text encoder can improve the performance of diffusion models. Then, we illustrate that TexForce can be simply combined with existing U-Net finetuned models to get much better results without additional training. Finally, we showcase the adaptability of our method in diverse applications, including the generation of high-quality face and hand images. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.13101 [pdf, ps, other]

doi 10.1088/1674-1137/ad0f13

Analysis of the data for $γp \to f_1(1285) p$ photoproduction

Authors: Ai-Chao Wang, Neng-Chang Wei, Fei Huang

Abstract: The photoproduction of $f_1(1285)$ meson off proton is investigated within an effective Lagrangian approach. The $t$-channel $ρ$- and $ω$-exchange diagrams, $u$-channel nucleon-exchange diagram, generalized contact term, and $s$-channel pole diagrams of nucleon and a minimal number of nucleon resonances are taken into account in constructing the reaction amplitudes to describe the experimental dat… ▽ More The photoproduction of $f_1(1285)$ meson off proton is investigated within an effective Lagrangian approach. The $t$-channel $ρ$- and $ω$-exchange diagrams, $u$-channel nucleon-exchange diagram, generalized contact term, and $s$-channel pole diagrams of nucleon and a minimal number of nucleon resonances are taken into account in constructing the reaction amplitudes to describe the experimental data. Three different models, i.e., the Feynman model, the Regge model, and the interpolated Regge model, are employed where the $t$-channel reaction amplitudes are constructed in Feynman type, Regge type, and interpolated Regge type, respectively. The results show that in neither Feynman model with two nucleon resonances introduced nor interpolated Regge model with one nucleon resonances included, can the available data for $γp \to f_1(1285) p$ be satisfactorily reproduced. Nevertheless, in the Regge model, when any one of the $N(1990){7/2}^+$, $N(2000){5/2}^+$, $N(2040){3/2}^+$, $N(2060){5/2}^-$, $N(2100){1/2}^+$, $N(2120){3/2}^-$, $N(2190){7/2}^-$, $N(2300){1/2}^+$, and $N(2570){5/2}^-$ resonances is considered, the data can be well described. The resulted resonance parameters are consistent with those advocated in Particle Data Group (PDG) review. Further analysis shows that in high-energy region, the peaks of $γp \to f_1(1285) p$ differential cross sections at forward angles are dominated by the contributions from $t$-channel $ρ$- and $ω$-exchange diagrams, while in low-energy region, the $s$-channel pole diagrams of resonances also provide significant contributions to the $γp \to f_1(1285) p$ cross sections. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: This paper is to be published in Chinese Physics C

Journal ref: Chin. Phys. C 48 (2024) 024105

arXiv:2311.12259 [pdf, ps, other]

doi 10.1016/j.physletb.2024.138797

Analytical models of supermassive black holes in galaxies surrounded by dark matter halos

Authors: Zibo Shen, Anzhong Wang, Yungui Gong, Shaoyu Yin

Abstract: In this Letter, we present five analytical models in closed forms, each representing a supermassive black hole (SMBH) located at the center of a galaxy surrounded by dark matter (DM) halo. The density profile of the halo vanishes inside twice the Schwarzschild radius of the hole and satisfies the weak, strong, and dominant energy conditions. The spacetime are asymptotically flat, and the differenc… ▽ More In this Letter, we present five analytical models in closed forms, each representing a supermassive black hole (SMBH) located at the center of a galaxy surrounded by dark matter (DM) halo. The density profile of the halo vanishes inside twice the Schwarzschild radius of the hole and satisfies the weak, strong, and dominant energy conditions. The spacetime are asymptotically flat, and the difference among the models lies in the slopes of the density profiles in the spike and regions far from the center of the galaxy. Three of them represent cusp models, whereas the other two represent core models. With the well-known (generalized) Newman-Janis algorithm, rotating SMBHs with DM halos can be easily constructed from these models. △ Less

Submitted 19 June, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: revtex4-2, no figures. Version to appear in Phys. Lett. B 855 (2024) 138797

Journal ref: Phys. Lett. B 855 (2024) 138797

arXiv:2311.10959 [pdf, other]

Structure-Aware Sparse-View X-ray 3D Reconstruction

Authors: Yuanhao Cai, Jiahao Wang, Alan Yuille, Zongwei Zhou, Angtian Wang

Abstract: X-ray, known for its ability to reveal internal structures of objects, is expected to provide richer information for 3D reconstruction than visible light. Yet, existing neural radiance fields (NeRF) algorithms overlook this important nature of X-ray, leading to their limitations in capturing structural contents of imaged objects. In this paper, we propose a framework, Structure-Aware X-ray Neural… ▽ More X-ray, known for its ability to reveal internal structures of objects, is expected to provide richer information for 3D reconstruction than visible light. Yet, existing neural radiance fields (NeRF) algorithms overlook this important nature of X-ray, leading to their limitations in capturing structural contents of imaged objects. In this paper, we propose a framework, Structure-Aware X-ray Neural Radiodensity Fields (SAX-NeRF), for sparse-view X-ray 3D reconstruction. Firstly, we design a Line Segment-based Transformer (Lineformer) as the backbone of SAX-NeRF. Linefomer captures internal structures of objects in 3D space by modeling the dependencies within each line segment of an X-ray. Secondly, we present a Masked Local-Global (MLG) ray sampling strategy to extract contextual and geometric information in 2D projection. Plus, we collect a larger-scale dataset X3D covering wider X-ray applications. Experiments on X3D show that SAX-NeRF surpasses previous NeRF-based methods by 12.56 and 2.49 dB on novel view synthesis and CT reconstruction. Code, models, and data are released at https://github.com/caiyuanhao1998/SAX-NeRF △ Less

Submitted 23 March, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: CVPR 2024; The first Transformer-based method for X-ray and CT 3D reconstruction

arXiv:2311.10166 [pdf, other]

doi 10.1103/PhysRevD.109.026015

Revisiting quantum black holes from effective loop quantum gravity

Authors: Geeth Ongole, Parampreet Singh, Anzhong Wang

Abstract: We systematically study a family of loop quantizations for the classical Kruskal spacetimes using the effective description motivated from loop quantum gravity for four generic parameters, $c_o, m, δ_b$, and $δ_c$, where the latter two denote the polymerization parameters that capture the underlying quantum geometry. We focus on the family where polymerization parameters are constant on dynamical… ▽ More We systematically study a family of loop quantizations for the classical Kruskal spacetimes using the effective description motivated from loop quantum gravity for four generic parameters, $c_o, m, δ_b$, and $δ_c$, where the latter two denote the polymerization parameters that capture the underlying quantum geometry. We focus on the family where polymerization parameters are constant on dynamical trajectories and of which the Ashtekar-Olmedo-Singh (AOS) and Corichi-Singh (CS) models appear as special cases. We study general features of singularity resolution in all these models due to quantum gravity effects and analytically extend the solutions across the white hole (WH) and black hole (BH) horizons to the exterior. We find that the leading term in the asymptotic expansion of the Kretschmann scalar is $r^{-4}$. However, for AOS and CS models, black holes with masses greater than solar mass, the dominant term behaves as $r^{-6}$ for the size of the observable Universe and our analysis can be used to phenomenologically constrain the choice of parameters for other models. In addition, one can uniquely fix the parameter $c_o$ by requiring that the Hawking temperature at the BH horizon to the leading order be consistent with its classical value for a macroscopic BH. Assuming that the BH and WH masses are of the same order, we are able to identify a family of choices of $δ_b$ and $δ_c$ which share all the desired properties of the AOS model. △ Less

Submitted 7 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: pdfLatex, 14 pages, 6 figures

Journal ref: Phys. Rev. D 109, 026015 (2024)

arXiv:2311.07048 [pdf, ps, other]

Gauss-Euler Primality Test

Authors: Almas Wang

Abstract: This paper presents two efficient primality tests that quickly and accurately test all integers up to $2^{64}$. This paper presents two efficient primality tests that quickly and accurately test all integers up to $2^{64}$. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 10 pages, C++ code included

MSC Class: 11A41; 11-04; 11Y11

arXiv:2311.06783 [pdf, other]

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Authors: Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Kaixin Xu, Chunyi Li, **gwen Hou, Guangtao Zhai, Geng Xue, Wenxiu Sun, Qiong Yan, Weisi Lin

Abstract: Multi-modality foundation models, as represented by GPT-4V, have brought a new paradigm for low-level visual perception and understanding tasks, that can respond to a broad range of natural human instructions in a model. While existing foundation models have shown exciting potentials on low-level visual tasks, their related abilities are still preliminary and need to be improved. In order to enhan… ▽ More Multi-modality foundation models, as represented by GPT-4V, have brought a new paradigm for low-level visual perception and understanding tasks, that can respond to a broad range of natural human instructions in a model. While existing foundation models have shown exciting potentials on low-level visual tasks, their related abilities are still preliminary and need to be improved. In order to enhance these models, we conduct a large-scale subjective experiment collecting a vast number of real human feedbacks on low-level vision. Each feedback follows a pathway that starts with a detailed description on the low-level visual appearance (*e.g. clarity, color, brightness* of an image, and ends with an overall conclusion, with an average length of 45 words. The constructed **Q-Pathway** dataset includes 58K detailed human feedbacks on 18,973 images with diverse low-level appearance. Moreover, to enable foundation models to robustly respond to diverse types of questions, we design a GPT-participated conversion to process these feedbacks into diverse-format 200K instruction-response pairs. Experimental results indicate that the **Q-Instruct** consistently elevates low-level perception and understanding abilities across several foundational models. We anticipate that our datasets can pave the way for a future that general intelligence can perceive, understand low-level visual appearance and evaluate visual quality like a human. Our dataset, model zoo, and demo is published at: https://q-future.github.io/Q-Instruct. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 16 pages, 11 figures, page 12-16 as appendix

arXiv:2311.05608 [pdf, other]

FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts

Authors: Yichen Gong, Delong Ran, **yuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, Xiaoyun Wang

Abstract: Ensuring the safety of artificial intelligence-generated content (AIGC) is a longstanding topic in the artificial intelligence (AI) community, and the safety concerns associated with Large Language Models (LLMs) have been widely investigated. Recently, large vision-language models (VLMs) represent an unprecedented revolution, as they are built upon LLMs but can incorporate additional modalities (e… ▽ More Ensuring the safety of artificial intelligence-generated content (AIGC) is a longstanding topic in the artificial intelligence (AI) community, and the safety concerns associated with Large Language Models (LLMs) have been widely investigated. Recently, large vision-language models (VLMs) represent an unprecedented revolution, as they are built upon LLMs but can incorporate additional modalities (e.g., images). However, the safety of VLMs lacks systematic evaluation, and there may be an overconfidence in the safety guarantees provided by their underlying LLMs. In this paper, to demonstrate that introducing additional modality modules leads to unforeseen AI safety issues, we propose FigStep, a straightforward yet effective jailbreaking algorithm against VLMs. Instead of feeding textual harmful instructions directly, FigStep converts the harmful content into images through typography to bypass the safety alignment within the textual module of the VLMs, inducing VLMs to output unsafe responses that violate common AI safety policies. In our evaluation, we manually review 46,500 model responses generated by 3 families of the promising open-source VLMs, i.e., LLaVA, MiniGPT4, and CogVLM (a total of 6 VLMs). The experimental results show that FigStep can achieve an average attack success rate of 82.50% on 500 harmful queries in 10 topics. Moreover, we demonstrate that the methodology of FigStep can even jailbreak GPT-4V, which already leverages an OCR detector to filter harmful queries. Above all, our work reveals that VLMs are vulnerable to jailbreaking attacks, which highlights the necessity of novel safety alignments between visual and textual modalities. △ Less

Submitted 13 December, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: Technical Report

arXiv:2311.05467 [pdf]

Lithium-ion battery performance model including solvent segregation effects

Authors: Ruihe Li, Simon O'Kane, Andrew Wang, Taeho Jung, Niall Kirkaldy, Monica Marinescu, Charles W. Monroe, Gregory J. Offer

Abstract: A model of a lithium-ion battery containing a cosolvent electrolyte is developed and implemented within the open-source PyBaMM platform. Lithium-ion electrolytes are essential to battery operation and normally contain at least two solvents to satisfy performance requirements. The widely used Doyle-Fuller-Newman battery model assumes that the electrolyte comprises a salt dissolved in a single effec… ▽ More A model of a lithium-ion battery containing a cosolvent electrolyte is developed and implemented within the open-source PyBaMM platform. Lithium-ion electrolytes are essential to battery operation and normally contain at least two solvents to satisfy performance requirements. The widely used Doyle-Fuller-Newman battery model assumes that the electrolyte comprises a salt dissolved in a single effective solvent, however. This single-solvent approximation has been disproved experimentally and may hinder accurate battery modelling. Here, we present a two-solvent model that resolves the transport of ethylene carbonate (EC) and lithium salt in a background linear carbonate. EC concentration polarization opposes that of Li+ during cycling, affecting local electrolyte properties and cell-level overpotentials. Concentration gradients of Li+ can be affected by cross-diffusion, whereby EC gradients enhance or impede salt flux. A rationally parametrized model that includes EC transport predicts 6% more power loss at 4.5C discharge and ~0.32% more capacity loss after a thousand 1C cycles than its single-solvent equivalent. This work provides a tool to model more transport behaviour in the electrolyte that may affect degradation and enables the transfer of microscopic knowledge about solvation structure-dependent performance to the macroscale. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.02943 [pdf]

Sub-5-nm Ultra-thin In$_2$O$_3$ Transistors for High-Performance and Low-Power Electronic Applications

Authors: Linqiang Xu, Lianqiang Xu, Jun Lan, Yida Li, Qiuhui Li, Aili Wang, Ying Guo, Yee Sin Ang, Ruge Quhe, **g Lu

Abstract: Ultra-thin (UT) oxide semiconductors are promising candidates for back-end-of-line (BEOL) compatible transistors and monolithic three-dimensional integration. Experimentally, UT indium oxide (In$_2$O$_3$) field-effect transistors (FETs) with thicknesses down to 0.4 nm exhibits extremely high drain current (10000 $μ$A/$μ$m) and transconductance (4000 $μ$S/$μ$m). Here, we employ the ab initio quantu… ▽ More Ultra-thin (UT) oxide semiconductors are promising candidates for back-end-of-line (BEOL) compatible transistors and monolithic three-dimensional integration. Experimentally, UT indium oxide (In$_2$O$_3$) field-effect transistors (FETs) with thicknesses down to 0.4 nm exhibits extremely high drain current (10000 $μ$A/$μ$m) and transconductance (4000 $μ$S/$μ$m). Here, we employ the ab initio quantum transport simulation to investigate the performance limit of sub-5-nm gate length (Lg) UT In$_2$O$_3$ FET. Based on the International Technology Roadmap for Semiconductors (ITRS) criteria for high-performance (HP) devices, the scaling limit of UT In$_2$O$_3$ FETs can reach 2 nm in terms of on-state current, delay time, and power dissipation. The wide bandgap nature of UT In$_2$O$_3$ (3.15 eV) renders it a suitable candidate for ITRS low-power (LP) electronics with Lg down to 3 nm. Both the HP and LP UT In$_2$O$_3$ FETs exhibit superior energy-delay products as compared to other common 2D semiconductors such as monolayer MoS2 and MoTe2. Our study unveils the immense promise of UT In$_2$O$_3$ for both HP and LP device applications. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 16 pages, 7 figures

arXiv:2311.02401 [pdf, other]

BarcodeBERT: Transformers for Biodiversity Analysis

Authors: Pablo Millan Arias, Niousha Sadjadi, Monireh Safari, ZeMing Gong, Austin T. Wang, Scott C. Lowe, Joakim Bruslund Haurum, Iuliia Zarubiieva, Dirk Steinke, Lila Kari, Angel X. Chang, Graham W. Taylor

Abstract: Understanding biodiversity is a global challenge, in which DNA barcodes - short snippets of DNA that cluster by species - play a pivotal role. In particular, invertebrates, a highly diverse and under-explored group, pose unique taxonomic complexities. We explore machine learning approaches, comparing supervised CNNs, fine-tuned foundation models, and a DNA barcode-specific masking strategy across… ▽ More Understanding biodiversity is a global challenge, in which DNA barcodes - short snippets of DNA that cluster by species - play a pivotal role. In particular, invertebrates, a highly diverse and under-explored group, pose unique taxonomic complexities. We explore machine learning approaches, comparing supervised CNNs, fine-tuned foundation models, and a DNA barcode-specific masking strategy across datasets of varying complexity. While simpler datasets and tasks favor supervised CNNs or fine-tuned transformers, challenging species-level identification demands a paradigm shift towards self-supervised pretraining. We propose BarcodeBERT, the first self-supervised method for general biodiversity analysis, leveraging a 1.5 M invertebrate DNA barcode reference library. This work highlights how dataset specifics and coverage impact model selection, and underscores the role of self-supervised pretraining in achieving high-accuracy DNA barcode-based identification at the species and genus level. Indeed, without the fine-tuning step, BarcodeBERT pretrained on a large DNA barcode dataset outperforms DNABERT and DNABERT-2 on multiple downstream classification tasks. The code repository is available at https://github.com/Kari-Genomics-Lab/BarcodeBERT △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: Main text: 5 pages, Total: 9 pages, 2 figures, accepted at the 4th Workshop on Self-Supervised Learning: Theory and Practice (NeurIPS 2023)

arXiv:2310.20365 [pdf]

doi 10.1063/5.0187742

Physical-layer key distribution using synchronous complex dynamics of DBR semiconductor lasers

Authors: Anbang Wang, Yicheng Du, Qingtian Li, Longsheng Wang, Zhiwei Jia, Yuwen Qin, Yuncai Wang

Abstract: Common-signal-induced synchronization of semiconductor lasers with optical feedback inspired a promising physical key distribution with information-theoretic security and potential in high rate. A significant challenge is the requirement to shorten the synchronization recovery time for increasing key rate without sacrificing operation parameter space for security. Here, open-loop synchronization o… ▽ More Common-signal-induced synchronization of semiconductor lasers with optical feedback inspired a promising physical key distribution with information-theoretic security and potential in high rate. A significant challenge is the requirement to shorten the synchronization recovery time for increasing key rate without sacrificing operation parameter space for security. Here, open-loop synchronization of wavelength-tunable multi-section distributed Bragg reflector (DBR) lasers is proposed as a solution for physical-layer key distribution. Experiments show that the synchronization is sensitive to two operation parameters, i.e., currents of grating section and phase section. Furthermore, fast wavelength-shift keying synchronization can be achieved by direct modulation on one of the two currents. The synchronization recovery time is shortened by one order of magnitude compared to close-loop synchronization. An experimental implementation is demonstrated with a final key rate of 5.98 Mbit/s over 160 km optical fiber distance. It is thus believed that fast-tunable multi-section semiconductor lasers opens a new avenue of high-rate physical-layer key distribution using laser synchronization. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 13 pages, 5 figures

Journal ref: APL Photonics 9, 036109 (2024)

arXiv:2310.20079 [pdf, other]

Hybridizing Physics and Neural ODEs for Predicting Plasma Inductance Dynamics in Tokamak Fusion Reactors

Authors: Allen M. Wang, Darren T. Garnier, Cristina Rea

Abstract: While fusion reactors known as tokamaks hold promise as a firm energy source, advances in plasma control, and handling of events where control of plasmas is lost, are needed for them to be economical. A significant bottleneck towards applying more advanced control algorithms is the need for better plasma simulation, where both physics-based and data-driven approaches currently fall short. The form… ▽ More While fusion reactors known as tokamaks hold promise as a firm energy source, advances in plasma control, and handling of events where control of plasmas is lost, are needed for them to be economical. A significant bottleneck towards applying more advanced control algorithms is the need for better plasma simulation, where both physics-based and data-driven approaches currently fall short. The former is bottle-necked by both computational cost and the difficulty of modelling plasmas, and the latter is bottle-necked by the relative paucity of data. To address this issue, this work applies the neural ordinary differential equations (ODE) framework to the problem of predicting a subset of plasma dynamics, namely the coupled plasma current and internal inductance dynamics. As the neural ODE framework allows for the natural inclusion of physics-based inductive biases, we train both physics-based and neural network models on data from the Alcator C-Mod fusion reactor and find that a model that combines physics-based equations with a neural ODE performs better than both existing physics-motivated ODEs and a pure neural ODE model. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.19295 [pdf, other]

ROAM: memory-efficient large DNN training via optimized operator ordering and memory layout

Authors: Huiyao Shu, Ang Wang, Ziji Shi, Hanyu Zhao, Yong Li, Lu Lu

Abstract: As deep learning models continue to increase in size, the memory requirements for training have surged. While high-level techniques like offloading, recomputation, and compression can alleviate memory pressure, they also introduce overheads. However, a memory-efficient execution plan that includes a reasonable operator execution order and tensor memory layout can significantly increase the models'… ▽ More As deep learning models continue to increase in size, the memory requirements for training have surged. While high-level techniques like offloading, recomputation, and compression can alleviate memory pressure, they also introduce overheads. However, a memory-efficient execution plan that includes a reasonable operator execution order and tensor memory layout can significantly increase the models' memory efficiency and reduce overheads from high-level techniques. In this paper, we propose ROAM which operates on computation graph level to derive memory-efficient execution plan with optimized operator order and tensor memory layout for models. We first propose sophisticated theories that carefully consider model structure and training memory load to support optimization for large complex graphs that have not been well supported in the past. An efficient tree-based algorithm is further proposed to search task divisions automatically, along with delivering high performance and effectiveness to solve the problem. Experiments show that ROAM achieves a substantial memory reduction of 35.7%, 13.3%, and 27.2% compared to Pytorch and two state-of-the-art methods and offers a remarkable 53.7x speedup. The evaluation conducted on the expansive GPT2-XL further validates ROAM's scalability. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.19180 [pdf, other]

JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation

Authors: Yao Yao, Peike Li, Boyu Chen, Alex Wang

Abstract: With rapid advances in generative artificial intelligence, the text-to-music synthesis task has emerged as a promising direction for music generation from scratch. However, finer-grained control over multi-track generation remains an open challenge. Existing models exhibit strong raw generation capability but lack the flexibility to compose separate tracks and combine them in a controllable manner… ▽ More With rapid advances in generative artificial intelligence, the text-to-music synthesis task has emerged as a promising direction for music generation from scratch. However, finer-grained control over multi-track generation remains an open challenge. Existing models exhibit strong raw generation capability but lack the flexibility to compose separate tracks and combine them in a controllable manner, differing from typical workflows of human composers. To address this issue, we propose JEN-1 Composer, a unified framework to efficiently model marginal, conditional, and joint distributions over multi-track music via a single model. JEN-1 Composer framework exhibits the capacity to seamlessly incorporate any diffusion-based music generation system, \textit{e.g.} Jen-1, enhancing its capacity for versatile multi-track music generation. We introduce a curriculum training strategy aimed at incrementally instructing the model in the transition from single-track generation to the flexible generation of multi-track combinations. During the inference, users have the ability to iteratively produce and choose music tracks that meet their preferences, subsequently creating an entire musical composition incrementally following the proposed Human-AI co-composition workflow. Quantitative and qualitative assessments demonstrate state-of-the-art performance in controllable and high-fidelity multi-track music synthesis. The proposed JEN-1 Composer represents a significant advance toward interactive AI-facilitated music creation and composition. Demos will be available at https://www.jenmusic.ai/audio-demos. △ Less

Submitted 2 November, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

Comments: Preprints

arXiv:2310.17927 [pdf, other]

A Pure Quantum Approximate Optimization Algorithm Based on CNR Operation

Authors: Da You Lv, An Min Wang

Abstract: By introducing the "comparison and replacement" (CNR) operation, we propose a general-purpose pure quantum approximate optimization algorithm and derive its core optimization mechanism quantitatively. The algorithm is constructed to a $p$-level divide-and-conquer structure based on the CNR operations. The quality of approximate optimization improves with the increase of $p$. For sufficiently gener… ▽ More By introducing the "comparison and replacement" (CNR) operation, we propose a general-purpose pure quantum approximate optimization algorithm and derive its core optimization mechanism quantitatively. The algorithm is constructed to a $p$-level divide-and-conquer structure based on the CNR operations. The quality of approximate optimization improves with the increase of $p$. For sufficiently general optimization problems, the algorithm can work and produce the near-optimal solutions as expected with considerably high probability. Moreover, we demonstrate that the algorithm is scalable to be applied to large size problems. Our algorithm is applied to two optimization problems with significantly different degeneracy, the Gaussian weighted 2-edge graph and MAX-2-XOR, and then we show the algorithm performance in detail when the required qubits number of the two optimization problems is 10. △ Less

Submitted 25 January, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.17897 [pdf, other]

Event Generation and Consistence Test for Physics with Sliced Wasserstein Distance

Authors: Chu-Cheng Pan, Xiang Dong, Yu-Chang Sun, Ao-Yan Cheng, Ao-Bo Wang, Yu-Xuan Hu, Hao Cai

Abstract: In the field of modern high-energy physics research, there is a growing emphasis on utilizing deep learning techniques to optimize event simulation, thereby expanding the statistical sample size for more accurate physical analysis. Traditional simulation methods often encounter challenges when dealing with complex physical processes and high-dimensional data distributions, resulting in slow perfor… ▽ More In the field of modern high-energy physics research, there is a growing emphasis on utilizing deep learning techniques to optimize event simulation, thereby expanding the statistical sample size for more accurate physical analysis. Traditional simulation methods often encounter challenges when dealing with complex physical processes and high-dimensional data distributions, resulting in slow performance. To overcome these limitations, we propose a solution based on deep learning with the sliced Wasserstein distance as the loss function. Our method shows its ability on high precision and large-scale simulations, and demonstrates its effectiveness in handling complex physical processes. By employing an advanced transformer learning architecture, we initiate the learning process from a Monte Carlo sample, and generate high-dimensional data while preserving all original distribution features. The generated data samples have passed the consistence test, that is developed to calculate the confidence of the high-dimentional distributions of the generated data samples through permutation tests. This fast simulation strategy, enabled by deep learning, holds significant potential not only for increasing sample sizes and reducing statistical uncertainties but also for applications in numerical integration, which is crucial in partial wave analysis, high-precision sample checks, and other related fields. It opens up new possibilities for improving event simulation in high-energy physics research. △ Less

Submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.17870 [pdf, other]

Ranking with Slot Constraints

Authors: Wentao Guo, Andrew Wang, Bradon Thymes, Thorsten Joachims

Abstract: We introduce the problem of ranking with slot constraints, which can be used to model a wide range of application problems -- from college admission with limited slots for different majors, to composing a stratified cohort of eligible participants in a medical trial. We show that the conventional Probability Ranking Principle (PRP) can be highly sub-optimal for slot-constrained ranking problems, a… ▽ More We introduce the problem of ranking with slot constraints, which can be used to model a wide range of application problems -- from college admission with limited slots for different majors, to composing a stratified cohort of eligible participants in a medical trial. We show that the conventional Probability Ranking Principle (PRP) can be highly sub-optimal for slot-constrained ranking problems, and we devise a new ranking algorithm, called MatchRank. The goal of MatchRank is to produce rankings that maximize the number of filled slots if candidates are evaluated by a human decision maker in the order of the ranking. In this way, MatchRank generalizes the PRP, and it subsumes the PRP as a special case when there are no slot constraints. Our theoretical analysis shows that MatchRank has a strong approximation guarantee without any independence assumptions between slots or candidates. Furthermore, we show how MatchRank can be implemented efficiently. Beyond the theoretical guarantees, empirical evaluations show that MatchRank can provide substantial improvements over a range of synthetic and real-world tasks. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.15766 [pdf, other]

Robust Learning via Conditional Prevalence Adjustment

Authors: Minh Nguyen, Alan Q. Wang, Heejong Kim, Mert R. Sabuncu

Abstract: Healthcare data often come from multiple sites in which the correlations between confounding variables can vary widely. If deep learning models exploit these unstable correlations, they might fail catastrophically in unseen sites. Although many methods have been proposed to tackle unstable correlations, each has its limitations. For example, adversarial training forces models to completely ignore… ▽ More Healthcare data often come from multiple sites in which the correlations between confounding variables can vary widely. If deep learning models exploit these unstable correlations, they might fail catastrophically in unseen sites. Although many methods have been proposed to tackle unstable correlations, each has its limitations. For example, adversarial training forces models to completely ignore unstable correlations, but doing so may lead to poor predictive performance. Other methods (e.g. Invariant risk minimization [4]) try to learn domain-invariant representations that rely only on stable associations by assuming a causal data-generating process (input X causes class label Y ). Thus, they may be ineffective for anti-causal tasks (Y causes X), which are common in computer vision. We propose a method called CoPA (Conditional Prevalence-Adjustment) for anti-causal tasks. CoPA assumes that (1) generation mechanism is stable, i.e. label Y and confounding variable(s) Z generate X, and (2) the unstable conditional prevalence in each site E fully accounts for the unstable correlations between X and Y . Our crucial observation is that confounding variables are routinely recorded in healthcare settings and the prevalence can be readily estimated, for example, from a set of (Y, Z) samples (no need for corresponding samples of X). CoPA can work even if there is a single training site, a scenario which is often overlooked by existing methods. Our experiments on synthetic and real data show CoPA beating competitive baselines. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted at WACV

arXiv:2310.13419 [pdf, other]

Low Cross-Talk Optical Addressing of Trapped-Ion Qubits Using a Novel Integrated Photonic Chip

Authors: A. S. Sotirova, B. Sun, J. D. Leppard, A. Wang, M. Wang, A. Vazquez-Brennan, D. P. Nadlinger, S. Moser, A. Jesacher, C. He, F. Pokorny, M. J. Booth, C. J. Ballance

Abstract: Individual optical addressing in chains of trapped atomic ions requires generation of many small, closely spaced beams with low cross-talk. Furthermore, implementing parallel operations necessitates phase, frequency, and amplitude control of each individual beam. Here we present a scalable method for achieving all of these capabilities using a novel integrated photonic chip coupled to a network of… ▽ More Individual optical addressing in chains of trapped atomic ions requires generation of many small, closely spaced beams with low cross-talk. Furthermore, implementing parallel operations necessitates phase, frequency, and amplitude control of each individual beam. Here we present a scalable method for achieving all of these capabilities using a novel integrated photonic chip coupled to a network of optical fibre components. The chip design results in very low cross-talk between neighbouring channels even at the micrometre-scale spacing by implementing a very high refractive index contrast between the channel core and cladding. Furthermore, the photonic chip manufacturing procedure is highly flexible, allowing for the creation of devices with an arbitrary number of channels as well as non-uniform channel spacing at the chip output. We present the system used to integrate the chip within our ion trap apparatus and characterise the performance of the full individual addressing setup using a single trapped ion as a light-field sensor. Our measurements showed intensity cross-talk below $10^{-3}$ across the chip, with minimum observed cross-talk as low as $O\left(10^{-5}\right)$. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: 14 pages, 12 figures

arXiv:2310.13294 [pdf, other]

doi 10.1145/3629606.3629643

VR PreM+ : An Immersive Pre-learning Branching Visualization System for Museum Tours

Authors: Ze Gao, Xiang Li, Changkun Liu, Xian Wang, Anqi Wang, Liang Yang, Yuyang Wang, Pan Hui, Tristan Braud

Abstract: We present VR PreM+, an innovative VR system designed to enhance web exploration beyond traditional computer screens. Unlike static 2D displays, VR PreM+ leverages 3D environments to create an immersive pre-learning experience. Using keyword-based information retrieval allows users to manage and connect various content sources in a dynamic 3D space, improving communication and data comparison. We… ▽ More We present VR PreM+, an innovative VR system designed to enhance web exploration beyond traditional computer screens. Unlike static 2D displays, VR PreM+ leverages 3D environments to create an immersive pre-learning experience. Using keyword-based information retrieval allows users to manage and connect various content sources in a dynamic 3D space, improving communication and data comparison. We conducted preliminary and user studies that demonstrated efficient information retrieval, increased user engagement, and a greater sense of presence. These findings yielded three design guidelines for future VR information systems: display, interaction, and user-centric design. VR PreM+ bridges the gap between traditional web browsing and immersive VR, offering an interactive and comprehensive approach to information acquisition. It holds promise for research, education, and beyond. △ Less

Submitted 1 November, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

Comments: Accepted for publication at The Eleventh International Symposium of Chinese CHI (Chinese CHI 2023), Bali

MSC Class: 14J60 (Primary) 14F05; 14J26 (Secondary) ACM Class: F.2.2; I.2.7

arXiv:2310.13124 [pdf]

Efficient online cross-covariance monitoring with incremental SVD: An approach for the detection of emerging dependency patterns in IoT systems

Authors: Xinmiao Luan, Qing Zou, Jian Li, Andi Wang

Abstract: The development of the manufacturing systems has made it increasingly necessary to monitor the data generated by multiple interconnected subsystems with rapid incoming of samples. Based on incremental Singular Value Decomposition (ISVD), we develop a general online monitoring approach for the relationship of data generated from two interconnected subsystems, where each subsystem produces big data… ▽ More The development of the manufacturing systems has made it increasingly necessary to monitor the data generated by multiple interconnected subsystems with rapid incoming of samples. Based on incremental Singular Value Decomposition (ISVD), we develop a general online monitoring approach for the relationship of data generated from two interconnected subsystems, where each subsystem produces big data streams with several variation patterns in normal working condition. When special situation happens and new associations occur, a very small amount of computation is sufficient to update the system status and compute the control statistics by using this approach. The proposed method reduces computational overhead and retains only a small number of pairs of possible dependent patterns at each step. The validation of the method through simulation studies and a case study on semiconductor manufacturing processes further supports its effectiveness. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.12271 [pdf, ps, other]

Real-Variable Theory for New Herz-Type Hardy Spaces Associated with Ball Quasi-Banach Function Spaces

Authors: Aiting Wang, Wenhua Wang, Mingquan Wei

Abstract: Let $X$ be a ball quasi-Banach function space, $α\in \mathbb{R}$ and $q\in(0,\infty)$. In this paper, the authors first introduce the new Herz-type Hardy spaces $\mathcal{H\dot{K}}_{X}^{α,\,q}({\mathbb {R}}^n)$ and $\mathcal{HK}_{X}^{α,\,q}({\mathbb {R}}^n)$ associated with ball quasi-Banach function space $X$, via the non-tangential grand maximal function. Then, under some mild assumptions on… ▽ More Let $X$ be a ball quasi-Banach function space, $α\in \mathbb{R}$ and $q\in(0,\infty)$. In this paper, the authors first introduce the new Herz-type Hardy spaces $\mathcal{H\dot{K}}_{X}^{α,\,q}({\mathbb {R}}^n)$ and $\mathcal{HK}_{X}^{α,\,q}({\mathbb {R}}^n)$ associated with ball quasi-Banach function space $X$, via the non-tangential grand maximal function. Then, under some mild assumptions on $X$, the authors establish the real-variable theory for $\mathcal{H\dot{K}}_{X}^{α,\,q}({\mathbb {R}}^n)$ and $\mathcal{HK}_{X}^{α,\,q}({\mathbb {R}}^n)$, in terms of maximal function characterizations, atomic and molecular decompositions, and obtain the boundedness of some sublinear operators from $\mathcal{H\dot{K}}_{X}^{α,\,q}({\mathbb {R}}^n)$ to $\mathcal{\dot{K}}_{X}^{α,\,q}({\mathbb {R}}^n)$ and from $\mathcal{HK}_{X}^{α,\,q}({\mathbb {R}}^n)$ to $\mathcal{K}_{X}^{α,\,q}({\mathbb {R}}^n)$. As appliccations, we give two concrete function spaces which are members of Herz-type Hardy spaces associated with ball quasi-Banach function spaces. △ Less

Submitted 9 September, 2023; originally announced October 2023.

Comments: 37 pages, submitted

MSC Class: Primary 42B30; Secondary 42B35; 46E30

arXiv:2310.11240 [pdf, ps, other]

A Novel Mixed-Integer Linear Programming Formulation for Continuous-Time Inventory Routing

Authors: Akang Wang, Xiandong Li, Jeffrey E. Arbogast, Zachary Wilson, Chrysanthos E. Gounaris

Abstract: Inventory management, vehicle routing, and delivery scheduling decisions are simultaneously considered in the context of the inventory routing problem. This paper focuses on the continuous-time version of this problem where, unlike its more traditional discrete-time counterpart, the distributor is required to guarantee that inventory levels are maintained within the desired intervals at any moment… ▽ More Inventory management, vehicle routing, and delivery scheduling decisions are simultaneously considered in the context of the inventory routing problem. This paper focuses on the continuous-time version of this problem where, unlike its more traditional discrete-time counterpart, the distributor is required to guarantee that inventory levels are maintained within the desired intervals at any moment of the planning horizon. In this work, we develop a compact mixed-integer linear programming formulation to model the continuous-time inventory routing problem. We further discuss means to expedite its solution process, including the adaptation of well-known rounded capacity inequalities to tighten the formulation in the context of a branch-and-cut algorithm. Through extensive computational studies on a suite of 90 benchmark instances from the literature, we show that our branch-and-cut algorithm outperforms the state-of-the-art approach. We also consider a new set of 63 instances adapted from a real-life dataset and show our algorithm's practical value in solving instances with up to 20 customers to guaranteed optimality. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: 27 pages

arXiv:2310.10058 [pdf, ps, other]

A lattice model with Fibonacci degree of degeneracy

Authors: Athena Wang

Abstract: In this paper, we explore two different methods of finding the degrees of degeneracy for lattice model systems, specifically constructing one with a Fibonacci degree of degeneracy. We also calculate the number of ground states per site as the golden ratio $(φ)$ for the system that we constructed and extend our results to systems with $k-$Step Fibonacci degrees of degeneracy. Finally, I end with a… ▽ More In this paper, we explore two different methods of finding the degrees of degeneracy for lattice model systems, specifically constructing one with a Fibonacci degree of degeneracy. We also calculate the number of ground states per site as the golden ratio $(φ)$ for the system that we constructed and extend our results to systems with $k-$Step Fibonacci degrees of degeneracy. Finally, I end with a few open questions that we may examine for future works. △ Less

Submitted 2 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 17 pages

arXiv:2310.08864 [pdf, other]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, A**kya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io. △ Less

Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Project website: https://robotics-transformer-x.github.io

arXiv:2310.04409 [pdf, other]

Metal-Optic Nanophotonic Modulators in Standard CMOS Technology

Authors: Mohamed ElKabbash, Sivan Trajtenberg-Mills, Isaac Harris, Saumil Bandyopadhyay, Mohamed I Ibrahim, Archer Wang, Xibi Chen, Cole Brabec, Hasan Z. Yildiz, Ruonan Han, Dirk Englund

Abstract: Integrating nanophotonics with electronics promises revolutionary applications, from LiDAR to holographic displays. Although silicon photonics is maturing, realizing active nanophotonics in the ubiquitous bulk CMOS processes remains challenging. We introduce a fabless approach to embed active nanophotonics in bulk CMOS by co-designing the back-end-of-line metal layers for optical functionality. Us… ▽ More Integrating nanophotonics with electronics promises revolutionary applications, from LiDAR to holographic displays. Although silicon photonics is maturing, realizing active nanophotonics in the ubiquitous bulk CMOS processes remains challenging. We introduce a fabless approach to embed active nanophotonics in bulk CMOS by co-designing the back-end-of-line metal layers for optical functionality. Using a 65nm CMOS process, we create plasmonic liquid crystal modulators with switching speeds 100x faster than commercial technologies. This zero-change nanophotonics method could equip mass-produced chips with optical communications, sensing and imaging. Embedding nanophotonics in the dominant electronics platform democratizes nanofabrication, spawning technologies from chip-scale LiDAR to holographic light-field displays. △ Less

Submitted 16 November, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2310.03594 [pdf, other]

Axion Universal Gravitational Wave Interpretation of Pulsar Timing Array Data

Authors: Kaloian D. Lozanov, Shi Pi, Misao Sasaki, Volodymyr Takhistov, Ao Wang

Abstract: Formation of cosmological solitons is generically accompanied by production of gravitational waves (GWs), with a universal GW background expected at frequency scales below that of non-linear dynamics. Beginning with a general phenomenological description of GWs associated with soliton formation, we demonstrate that universal GW background from axion-like particle (ALP) solitonic oscillons provides… ▽ More Formation of cosmological solitons is generically accompanied by production of gravitational waves (GWs), with a universal GW background expected at frequency scales below that of non-linear dynamics. Beginning with a general phenomenological description of GWs associated with soliton formation, we demonstrate that universal GW background from axion-like particle (ALP) solitonic oscillons provides a viable interpretation to the recent NANOGrav 15 year pulsar timing array data, which does not suffer from the overproduction of primordial black holes. We show that pulsar timing array data displays preference for models where formed solitons do not strongly interact or cluster. Coincidence observations with Nancy Roman telescope will allow to discriminate between distinct scenarios of cosmological solitons. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: 8 pages, 2 figures

Report number: IPMU23-0036, YITP-23-124, KEK-QUP-2023-0025, KEK-TH-2560, KEK-Cosmo-0328

Showing 151–200 of 1,301 results for author: Wang, A