Search | arXiv e-print repository

Sharpness and well-conditioning of nonsmooth convex formulations in statistical signal recovery

Abstract: We study a sample complexity vs. conditioning tradeoff in modern signal recovery problems where convex optimization problems are built from sampled observations. We begin by introducing a set of condition numbers related to sharpness in $\ell_p$ or Schatten-p norms ($p\in[1,2]$) based on nonsmooth reformulations of a class of convex optimization problems, including sparse recovery, low-rank matrix… ▽ More We study a sample complexity vs. conditioning tradeoff in modern signal recovery problems where convex optimization problems are built from sampled observations. We begin by introducing a set of condition numbers related to sharpness in $\ell_p$ or Schatten-p norms ($p\in[1,2]$) based on nonsmooth reformulations of a class of convex optimization problems, including sparse recovery, low-rank matrix sensing, covariance estimation, and (abstract) phase retrieval. In each of the recovery tasks, we show that the condition numbers become dimension independent constants once the sample size exceeds some constant multiple of the recovery threshold. Structurally, this result ensures that the inaccuracy in the recovered signal due to both observation noise and optimization error is well-controlled. Algorithmically, such a result ensures that a new first-order method for solving the class of sharp convex functions in a given $\ell_p$ or Schatten-p norm, when applied to the nonsmooth formulations, achieves nearly-dimension-independent linear convergence. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.06567 [pdf, other]

A Versatile Method of Engineering the Electron Wavefunction of Hybrid Quantum Devices

Authors: Guoan Li, Guang Yang, Ting Lin, M. Rossi, G. Badawy, Zhiyuan Zhang, Xiaofan Shi, Jiayu Shi, Degui Qian, Fang Lu, Lin Gu, An-Qi Wang, Zhaozheng Lyu, Guangtong Liu, Fanming Qu, Ziwei Dou, Qinghua Zhang, E. P. A. M. Bakkers, M. P. Nowak, P. Wójcik, Li Lu, Jie Shen

Abstract: With the development of quantum technology, hybrid devices that combine superconductors (S) and semiconductors (Sm) have attracted great attention due to the possibility of engineering structures that benefit from the integration of the properties of both materials. However, until now, none of the experiments have reported good control of band alignment at the interface, which determines the stren… ▽ More With the development of quantum technology, hybrid devices that combine superconductors (S) and semiconductors (Sm) have attracted great attention due to the possibility of engineering structures that benefit from the integration of the properties of both materials. However, until now, none of the experiments have reported good control of band alignment at the interface, which determines the strength of S-Sm coupling and the proximitized superconducting gap. Here, we fabricate hybrid devices in a generic way with argon milling to modify the interface while maintaining its high quality. First, after the milling the atomically connected S-Sm interfaces appear, resulting in a large induced gap, as well as the ballistic transport revealed by the multiple Andreev reflections and quantized above-gap conductance plateaus. Second, by comparing transport measurement with Schrödinger-Poisson (SP) calculations, we demonstrate that argon milling is capable of varying the band bending strength in the semiconducting wire as the electrons tend to accumulate on the etched surface for longer milling time. Finally, we perform nonlocal measurements on advanced devices to demonstrate the coexistence and tunability of crossed Andreev reflection (CAR) and elastic co-tunneling (ECT) -- key ingredients for building the prototype setup for realization of Kitaev chain and quantum entanglement probing. Such a versatile method, compatible with the standard fabrication process and accompanied by the well-controlled modification of the interface, will definitely boost the creation of more sophisticated hybrid devices for exploring physics in solid-state systems. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 18 pages, 9 figures

arXiv:2307.05898 [pdf, other]

Rectifying Noisy Labels with Sequential Prior: Multi-Scale Temporal Feature Affinity Learning for Robust Video Segmentation

Authors: Beilei Cui, Minqing Zhang, Mengya Xu, An Wang, Wu Yuan, Hongliang Ren

Abstract: Noisy label problems are inevitably in existence within medical image segmentation causing severe performance degradation. Previous segmentation methods for noisy label problems only utilize a single image while the potential of leveraging the correlation between images has been overlooked. Especially for video segmentation, adjacent frames contain rich contextual information beneficial in cognizi… ▽ More Noisy label problems are inevitably in existence within medical image segmentation causing severe performance degradation. Previous segmentation methods for noisy label problems only utilize a single image while the potential of leveraging the correlation between images has been overlooked. Especially for video segmentation, adjacent frames contain rich contextual information beneficial in cognizing noisy labels. Based on two insights, we propose a Multi-Scale Temporal Feature Affinity Learning (MS-TFAL) framework to resolve noisy-labeled medical video segmentation issues. First, we argue the sequential prior of videos is an effective reference, i.e., pixel-level features from adjacent frames are close in distance for the same class and far in distance otherwise. Therefore, Temporal Feature Affinity Learning (TFAL) is devised to indicate possible noisy labels by evaluating the affinity between pixels in two adjacent frames. We also notice that the noise distribution exhibits considerable variations across video, image, and pixel levels. In this way, we introduce Multi-Scale Supervision (MSS) to supervise the network from three different perspectives by re-weighting and refining the samples. This design enables the network to concentrate on clean samples in a coarse-to-fine manner. Experiments with both synthetic and real-world label noise demonstrate that our method outperforms recent state-of-the-art robust segmentation approaches. Code is available at https://github.com/BeileiCui/MS-TFAL. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: Accepted by MICCAI 2023

arXiv:2307.05468 [pdf, other]

My3DGen: A Scalable Personalized 3D Generative Model

Authors: Luchao Qi, Jiaye Wu, Annie N. Wang, Shengze Wang, Roni Sengupta

Abstract: In recent years, generative 3D face models (e.g., EG3D) have been developed to tackle the problem of synthesizing photo-realistic faces. However, these models are often unable to capture facial features unique to each individual, highlighting the importance of personalization. Some prior works have shown promise in personalizing generative face models, but these studies primarily focus on 2D setti… ▽ More In recent years, generative 3D face models (e.g., EG3D) have been developed to tackle the problem of synthesizing photo-realistic faces. However, these models are often unable to capture facial features unique to each individual, highlighting the importance of personalization. Some prior works have shown promise in personalizing generative face models, but these studies primarily focus on 2D settings. Also, these methods require both fine-tuning and storing a large number of parameters for each user, posing a hindrance to achieving scalable personalization. Another challenge of personalization is the limited number of training images available for each individual, which often leads to overfitting when using full fine-tuning methods. Our proposed approach, My3DGen, generates a personalized 3D prior of an individual using as few as 50 training images. My3DGen allows for novel view synthesis, semantic editing of a given face (e.g. adding a smile), and synthesizing novel appearances, all while preserving the original person's identity. We decouple the 3D facial features into global features and personalized features by freezing the pre-trained EG3D and training additional personalized weights through low-rank decomposition. As a result, My3DGen introduces only $\textbf{240K}$ personalized parameters per individual, leading to a $\textbf{127}\times$ reduction in trainable parameters compared to the $\textbf{30.6M}$ required for fine-tuning the entire parameter space. Despite this significant reduction in storage, our model preserves identity features without compromising the quality of downstream applications. △ Less

Submitted 20 May, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

Comments: Project page: https://luchaoqi.com/my3dgen/

arXiv:2307.03012 [pdf, other]

A Kerr-Newman-MOG black hole's impact on the magnetic reconnection

Authors: Sanjar Shaymatov, Mirzabek Alloqulov, Bobomurat Ahmedov, Anzhong Wang

Abstract: In this paper, we study the magnetic reconnection process of energy extraction from a rapidly rotating Kerr-Newman-MOG black hole by investigating the combined effect of black hole charge and the MOG parameter. We explore the energy efficiency of energy extraction and power by applying the new energy extraction mechanism proposed by Comisso and Asenjo. Based on an attractive gravitational charge o… ▽ More In this paper, we study the magnetic reconnection process of energy extraction from a rapidly rotating Kerr-Newman-MOG black hole by investigating the combined effect of black hole charge and the MOG parameter. We explore the energy efficiency of energy extraction and power by applying the new energy extraction mechanism proposed by Comisso and Asenjo. Based on an attractive gravitational charge of the MOG parameter $α$ that physically manifests to strengthen black hole gravity we show that the combined effect of the MOG parameter and black hole charge can play an increasingly important role and accordingly lead to high energy efficiency and power for the energy extraction via the magnetic reconnection. Further, we study to estimate the rate of energy extraction under the fast magnetic reconnection by comparing the power of the magnetic reconnection and Blandford-Znajek (BZ) mechanisms. We show that the rate of energy extraction increases as a consequence of the combined effect of black hole charge and MOG parameter. It suggests that magnetic reconnection is significantly more efficient than BZ. In fact, the magnetic reconnection is fueled by magnetic field energy due to the twisting of magnetic field lines around the black hole for the plasma acceleration, and thus MOG parameter gives rise to even more fast spin that can strongly change the magnetic field reconfiguration due to the frame dragging effect. This is how energy extraction is strongly enhanced through the magnetic reconnection, thus making the energy extraction surprisingly more efficient for the Kerr-Newman-MOG black hole than Kerr black hole under the combined effect of black hole charge and MOG parameter. △ Less

Submitted 12 July, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: 15 pages, one table, 7 captioned figures. Some inaccurate statements corrected, the result remains unaltered

arXiv:2307.02626 [pdf, ps, other]

Real-time Workload Pattern Analysis for Large-scale Cloud Databases

Authors: Jiaqi Wang, Tianyi Li, Anni Wang, Xiaoze Liu, Lu Chen, Jie Chen, Jianye Liu, Junyang Wu, Feifei Li, Yunjun Gao

Abstract: Hosting database services on cloud systems has become a common practice. This has led to the increasing volume of database workloads, which provides the opportunity for pattern analysis. Discovering workload patterns from a business logic perspective is conducive to better understanding the trends and characteristics of the database system. However, existing workload pattern discovery systems are… ▽ More Hosting database services on cloud systems has become a common practice. This has led to the increasing volume of database workloads, which provides the opportunity for pattern analysis. Discovering workload patterns from a business logic perspective is conducive to better understanding the trends and characteristics of the database system. However, existing workload pattern discovery systems are not suitable for large-scale cloud databases which are commonly employed by the industry. This is because the workload patterns of large-scale cloud databases are generally far more complicated than those of ordinary databases. In this paper, we propose Alibaba Workload Miner (AWM), a real-time system for discovering workload patterns in complicated large-scale workloads. AWM encodes and discovers the SQL query patterns logged from user requests and optimizes the querying processing based on the discovered patterns. First, Data Collection & Preprocessing Module collects streaming query logs and encodes them into high-dimensional feature embeddings with rich semantic contexts and execution features. Next, Online Workload Mining Module separates encoded queries by business groups and discovers the workload patterns for each group. Meanwhile, Offline Training Module collects labels and trains the classification model using the labels. Finally, Pattern-based Optimizing Module optimizes query processing in cloud databases by exploiting discovered patterns. Extensive experimental results on one synthetic dataset and two real-life datasets (extracted from Alibaba Cloud databases) show that AWM enhances the accuracy of pattern discovery by 66% and reduce the latency of online inference by 22%, compared with the state-of-the-arts. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: Proceedings of the VLDB Volume 16 (VLDB 2023)

arXiv:2307.02452 [pdf, other]

LLCaps: Learning to Illuminate Low-Light Capsule Endoscopy with Curved Wavelet Attention and Reverse Diffusion

Authors: Long Bai, Tong Chen, Yanan Wu, An Wang, Mobarakol Islam, Hongliang Ren

Abstract: Wireless capsule endoscopy (WCE) is a painless and non-invasive diagnostic tool for gastrointestinal (GI) diseases. However, due to GI anatomical constraints and hardware manufacturing limitations, WCE vision signals may suffer from insufficient illumination, leading to a complicated screening and examination procedure. Deep learning-based low-light image enhancement (LLIE) in the medical field gr… ▽ More Wireless capsule endoscopy (WCE) is a painless and non-invasive diagnostic tool for gastrointestinal (GI) diseases. However, due to GI anatomical constraints and hardware manufacturing limitations, WCE vision signals may suffer from insufficient illumination, leading to a complicated screening and examination procedure. Deep learning-based low-light image enhancement (LLIE) in the medical field gradually attracts researchers. Given the exuberant development of the denoising diffusion probabilistic model (DDPM) in computer vision, we introduce a WCE LLIE framework based on the multi-scale convolutional neural network (CNN) and reverse diffusion process. The multi-scale design allows models to preserve high-resolution representation and context information from low-resolution, while the curved wavelet attention (CWA) block is proposed for high-frequency and local feature learning. Furthermore, we combine the reverse diffusion procedure to further optimize the shallow output and generate the most realistic image. The proposed method is compared with ten state-of-the-art (SOTA) LLIE methods and significantly outperforms quantitatively and qualitatively. The superior performance on GI disease segmentation further demonstrates the clinical potential of our proposed model. Our code is publicly accessible. △ Less

Submitted 22 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: To appear in MICCAI 2023. Code availability: https://github.com/longbai1006/LLCaps

arXiv:2307.02052 [pdf]

Replicability of Simulation Studies for the Investigation of Statistical Methods: The RepliSims Project

Authors: K. Luijken, A. Lohmann, U. Alter, J. Claramunt Gonzalez, F. J. Clouth, J. L. Fossum, L. Hesen, A. H. J. Huizing, J. Ketelaar, A. K. Montoya, L. Nab, R. C. C. Nijman, B. B. L. Penning de Vries, T. D. Tibbe, Y. A. Wang, R. H. H. Groenwold

Abstract: Results of simulation studies evaluating the performance of statistical methods are often considered actionable and thus can have a major impact on the way empirical research is implemented. However, so far there is limited evidence about the reproducibility and replicability of statistical simulation studies. Therefore, eight highly cited statistical simulation studies were selected, and their re… ▽ More Results of simulation studies evaluating the performance of statistical methods are often considered actionable and thus can have a major impact on the way empirical research is implemented. However, so far there is limited evidence about the reproducibility and replicability of statistical simulation studies. Therefore, eight highly cited statistical simulation studies were selected, and their replicability was assessed by teams of replicators with formal training in quantitative methodology. The teams found relevant information in the original publications and used it to write simulation code with the aim of replicating the results. The primary outcome was the feasibility of replicability based on reported information in the original publications. Replicability varied greatly: Some original studies provided detailed information leading to almost perfect replication of results, whereas other studies did not provide enough information to implement any of the reported simulations. Replicators had to make choices regarding missing or ambiguous information in the original studies, error handling, and software environment. Factors facilitating replication included public availability of code, and descriptions of the data-generating procedure and methods in graphs, formulas, structured text, and publicly accessible additional resources such as technical reports. Replicability of statistical simulation studies was mainly impeded by lack of information and sustainability of information sources. Reproducibility could be achieved for simulation studies by providing open code and data as a supplement to the publication. Additionally, simulation studies should be transparently reported with all relevant information either in the research paper itself or in easily accessible supplementary material to allow for replicability. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 36 pages, 0 figures

arXiv:2306.16752 [pdf, other]

Rapid FRD determination for multiplexed fibre systems -- I. The quasi-near field model and its uncertainties

Authors: Weimin Sun, Xudong Chen, Jiabin Wang, Hang Jiang, Anzhi Wang, Qi Yan, Zhenyu Ma, Shengjia Wang, Tao Geng, Yue Zhong, Zhongquan Qu, Yunxiang Yan

Abstract: Focal Ratio Degradation (FRD) in fibres is a crucial factor to control in astronomical instruments in order to minimize light loss. As astronomical instrumentation has advanced, the integration of large populations of fibres has become common. However, determining FRD in multiplexed fibre systems has become a challenging and time-consuming task. The Integral Field Unit for the Fiber Arrayed Solar… ▽ More Focal Ratio Degradation (FRD) in fibres is a crucial factor to control in astronomical instruments in order to minimize light loss. As astronomical instrumentation has advanced, the integration of large populations of fibres has become common. However, determining FRD in multiplexed fibre systems has become a challenging and time-consuming task. The Integral Field Unit for the Fiber Arrayed Solar Optical Telescope (FASOT-IFU) represents the most densely arranged fibre-based IFU in a single unit. Due to the close packing of fibres in the V-groove of the slit end, measuring FRD is particularly challenging as the output spots are prone to overlap** with adjacent fibres. In this paper, a novel method based on the quasi-near field model is proposed to enable rapid FRD measurement in highly multiplexed fibre systems like IFUs and multi-object observation systems. The principle and uncertainties associated with the method are investigated. The method's validity is demonstrated by applying it to determine the FRD in FASOT-IFU, with the achieved FRD performance meeting the acceptable requirements of FASOT-IFU, where the output focal ratio primarily falls within the range of 5.0-7.0. The results indicate that the proposed method offers several advantages, including the simultaneous and rapid measurement of FRD in multiple fibres with high accuracy (error smaller than 0.35 in F-ratio). Furthermore, besides FRD, the method exhibits potential for extensive measurements of throughput, scrambling, and spectral analysis. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: 10 pages, 12 figures, submitted to MNRAS

arXiv:2306.16285 [pdf, other]

Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis

Authors: An Wang, Mobarakol Islam, Mengya Xu, Hongliang Ren

Abstract: Despite their impressive performance in various surgical scene understanding tasks, deep learning-based methods are frequently hindered from deploying to real-world surgical applications for various causes. Particularly, data collection, annotation, and domain shift in-between sites and patients are the most common obstacles. In this work, we mitigate data-related issues by efficiently leveraging… ▽ More Despite their impressive performance in various surgical scene understanding tasks, deep learning-based methods are frequently hindered from deploying to real-world surgical applications for various causes. Particularly, data collection, annotation, and domain shift in-between sites and patients are the most common obstacles. In this work, we mitigate data-related issues by efficiently leveraging minimal source images to generate synthetic surgical instrument segmentation datasets and achieve outstanding generalization performance on unseen real domains. Specifically, in our framework, only one background tissue image and at most three images of each foreground instrument are taken as the seed images. These source images are extensively transformed and employed to build up the foreground and background image pools, from which randomly sampled tissue and instrument images are composed with multiple blending techniques to generate new surgical scene images. Besides, we introduce hybrid training-time augmentations to diversify the training data further. Extensive evaluation on three real-world datasets, i.e., Endo2017, Endo2018, and RoboTool, demonstrates that our one-to-many synthetic surgical instruments datasets generation and segmentation framework can achieve encouraging performance compared with training with real data. Notably, on the RoboTool dataset, where a more significant domain gap exists, our framework shows its superiority of generalization by a considerable margin. We expect that our inspiring results will attract research attention to improving model generalization with data synthesizing. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: First two authors contributed equally. Accepted by IROS2023

arXiv:2306.15691 [pdf]

doi 10.1021/acsami.2c02956

Clean BN encapsulated 2D FETs with lithography compatible contacts

Authors: Binxi Liang, Anjian Wang, Jian Zhou, Shihao Ju, Jian Chen, Kenji Watanabe, Takashi Taniguchi, Yi Shi, Songlin Li

Abstract: Device passivation through ultraclean hexagonal BN encapsulation is proven one of the most effective ways for constructing high-quality devices with atomically thin semiconductors that preserves the ultraclean interface quality and intrinsic charge transport behavior. However, it remains challenging to integrate lithography compatible contact electrodes with flexible distributions and patterns. He… ▽ More Device passivation through ultraclean hexagonal BN encapsulation is proven one of the most effective ways for constructing high-quality devices with atomically thin semiconductors that preserves the ultraclean interface quality and intrinsic charge transport behavior. However, it remains challenging to integrate lithography compatible contact electrodes with flexible distributions and patterns. Here, we report the feasibility in straightforwardly integrating lithography defined contacts into BN encapsulated 2D FETs, giving rise to overall device quality comparable to the state-of-the-art results from the painstaking pure dry transfer processing. Electronic characterization on FETs consisting of WSe$_2$ and MoS$_2$ channels reveals an extremely low scanning hysteresis of ca. 2 mV on average, a low density of interfacial charged impurity of ca. $10^{11}\,$cm$^{-2}$, and generally high charge mobilities over $1000\,$cm$^{2}\cdot$V$^{-1}\cdot$s$^{-1}$ at low temperatures. The overall high device qualities verify the viability in directly integrating lithography defined contacts into BN encapsulated devices to exploit their intrinsic charge transport properties for advanced electronics. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: 17 pages, 4 figures

Journal ref: ACS Applied Materials & Interfaces, 14, 18697 (2022)

arXiv:2306.13794 [pdf, other]

Tensor Dirichlet Process Multinomial Mixture Model for Passenger Trajectory Clustering

Authors: Ziyue Li, Hao Yan, Chen Zhang, Andi Wang, Wolfgang Ketter, Lijun Sun, Fugee Tsung

Abstract: Passenger clustering based on travel records is essential for transportation operators. However, existing methods cannot easily cluster the passengers due to the hierarchical structure of the passenger trip information, namely: each passenger has multiple trips, and each trip contains multi-dimensional multi-mode information. Furthermore, existing approaches rely on an accurate specification of th… ▽ More Passenger clustering based on travel records is essential for transportation operators. However, existing methods cannot easily cluster the passengers due to the hierarchical structure of the passenger trip information, namely: each passenger has multiple trips, and each trip contains multi-dimensional multi-mode information. Furthermore, existing approaches rely on an accurate specification of the clustering number to start, which is difficult when millions of commuters are using the transport systems on a daily basis. In this paper, we propose a novel Tensor Dirichlet Process Multinomial Mixture model (Tensor-DPMM), which is designed to preserve the multi-mode and hierarchical structure of the multi-dimensional trip information via tensor, and cluster them in a unified one-step manner. The model also has the ability to determine the number of clusters automatically by using the Dirichlet Process to decide the probabilities for a passenger to be either assigned in an existing cluster or to create a new cluster: This allows our model to grow the clusters as needed in a dynamic manner. Finally, existing methods do not consider spatial semantic graphs such as geographical proximity and functional similarity between the locations, which may cause inaccurate clustering. To this end, we further propose a variant of our model, namely the Tensor-DPMM with Graph. For the algorithm, we propose a tensor Collapsed Gibbs Sampling method, with an innovative step of "disband and relocating", which disbands clusters with too small amount of members and relocates them to the remaining clustering. This avoids uncontrollable growing amounts of clusters. A case study based on Hong Kong metro passenger data is conducted to demonstrate the automatic process of learning the number of clusters, and the learned clusters are better in within-cluster compactness and cross-cluster separateness. △ Less

Submitted 23 June, 2023; originally announced June 2023.

Comments: Under Review of Transportation Research Part C: Emerging Technologies

arXiv:2306.12109 [pdf, other]

DiffuseIR:Diffusion Models For Isotropic Reconstruction of 3D Microscopic Images

Authors: Mingjie Pan, Yulu Gan, Fangxu Zhou, Jiaming Liu, Aimin Wang, Shanghang Zhang, Dawei Li

Abstract: Three-dimensional microscopy is often limited by anisotropic spatial resolution, resulting in lower axial resolution than lateral resolution. Current State-of-The-Art (SoTA) isotropic reconstruction methods utilizing deep neural networks can achieve impressive super-resolution performance in fixed imaging settings. However, their generality in practical use is limited by degraded performance cause… ▽ More Three-dimensional microscopy is often limited by anisotropic spatial resolution, resulting in lower axial resolution than lateral resolution. Current State-of-The-Art (SoTA) isotropic reconstruction methods utilizing deep neural networks can achieve impressive super-resolution performance in fixed imaging settings. However, their generality in practical use is limited by degraded performance caused by artifacts and blurring when facing unseen anisotropic factors. To address these issues, we propose DiffuseIR, an unsupervised method for isotropic reconstruction based on diffusion models. First, we pre-train a diffusion model to learn the structural distribution of biological tissue from lateral microscopic images, resulting in generating naturally high-resolution images. Then we use low-axial-resolution microscopy images to condition the generation process of the diffusion model and generate high-axial-resolution reconstruction results. Since the diffusion model learns the universal structural distribution of biological tissues, which is independent of the axial resolution, DiffuseIR can reconstruct authentic images with unseen low-axial resolutions into a high-axial resolution without requiring re-training. The proposed DiffuseIR achieves SoTA performance in experiments on EM data and can even compete with supervised methods. △ Less

Submitted 21 June, 2023; originally announced June 2023.

arXiv:2306.11565 [pdf, other]

HomeRobot: Open-Vocabulary Mobile Manipulation

Authors: Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander William Clegg, John Turner, Zsolt Kira, Manolis Savva, Angel Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton

Abstract: HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. This is a foundational challenge for robots to be useful assistants in human environments, because it invol… ▽ More HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. This is a foundational challenge for robots to be useful assistants in human environments, because it involves tackling sub-problems from across robotics: perception, language understanding, navigation, and manipulation are all essential to OVMM. In addition, integration of the solutions to these sub-problems poses its own substantial challenges. To drive research in this area, we introduce the HomeRobot OVMM benchmark, where an agent navigates household environments to grasp novel objects and place them on target receptacles. HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch to encourage replication of real-world experiments across labs. We implement both reinforcement learning and heuristic (model-based) baselines and show evidence of sim-to-real transfer. Our baselines achieve a 20% success rate in the real world; our experiments identify ways future research work improve performance. See videos on our website: https://ovmm.github.io/. △ Less

Submitted 10 January, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: 37 pages, 22 figures, 8 tables

arXiv:2306.09285 [pdf]

doi 10.1038/s41586-023-06363-3

Quantum metric-induced nonlinear transport in a topological antiferromagnet

Authors: Naizhou Wang, Daniel Kaplan, Zhaowei Zhang, Tobias Holder, Ning Cao, Aifeng Wang, Xiaoyuan Zhou, Feifei Zhou, Zhengzhi Jiang, Chusheng Zhang, Shihao Ru, Hongbing Cai, Kenji Watanabe, Takashi Taniguchi, Binghai Yan, Weibo Gao

Abstract: The Berry curvature and quantum metric are the imaginary part and real part, respectively, of the quantum geometric tensor which characterizes the topology of quantum states. The former is known to generate a zoo of important discoveries such as quantum Hall effect and anomalous Hall effect (AHE), while the consequences of the quantum metric have rarely been probed by transport. In this work, we o… ▽ More The Berry curvature and quantum metric are the imaginary part and real part, respectively, of the quantum geometric tensor which characterizes the topology of quantum states. The former is known to generate a zoo of important discoveries such as quantum Hall effect and anomalous Hall effect (AHE), while the consequences of the quantum metric have rarely been probed by transport. In this work, we observed quantum metric induced nonlinear transport, including both nonlinear AHE and diode-like nonreciprocal longitudinal response, in thin films of a topological antiferromagnet, MnBi$_2$Te$_4$. Our observation reveals that the transverse and longitudinal nonlinear conductivities reverse signs when reversing the antiferromagnetic order, diminish above the Néel temperature, and are insensitive to disorder scattering, thus verifying their origin in the band structure topology. They also flip signs between electron and hole-doped regions, in agreement with theoretical calculations. Our work provides a pathway to probe the quantum metric through nonlinear transport and to design magnetic nonlinear devices. △ Less

Submitted 1 July, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: 23 pages, 6 figures for the manuscript; Supplementary information included

Journal ref: Nature (2023)

arXiv:2306.08997

Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

Authors: Sarah J. Zhang, Samuel Florin, Ariel N. Lee, Eamon Niknafs, Andrei Marginean, Annie Wang, Keith Tyser, Zad Chin, Yann Hicke, Nikhil Singh, Madeleine Udell, Yoon Kim, Tonio Buonassisi, Armando Solar-Lezama, Iddo Drori

Abstract: We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that… ▽ More We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that GPT-3.5 successfully solves a third of the entire MIT curriculum, while GPT-4, with prompt engineering, achieves a perfect solve rate on a test set excluding questions based on images. We fine-tune an open-source large language model on this dataset. We employ GPT-4 to automatically grade model responses, providing a detailed performance breakdown by course, question, and answer type. By embedding questions in a low-dimensional space, we explore the relationships between questions, topics, and classes and discover which questions and classes are required for solving other questions and classes through few-shot learning. Our analysis offers valuable insights into course prerequisites and curriculum design, highlighting language models' potential for learning and improving Mathematics and EECS education. △ Less

Submitted 24 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Did not receive permission to release the data or model fine-tuned on the data

arXiv:2306.08478 [pdf, other]

Interfering Josephson diode effect and magnetochiral anisotropy in Ta2Pd3Te5 asymmetric edge interferometer

Authors: Yupeng Li, Dayu Yan, Yu Hong, Haohao Sheng, Anqi Wang, Ziwei Dou, Xingchen Guo, Xiaofan Shi, Zikang Su, Zhaozheng Lyu, Tian Qian, Guangtong Liu, Fanming Qu, Kun Jiang, Zhijun Wang, Youguo Shi, Zhu-An Xu, Jiang** Hu, Li Lu, Jie Shen

Abstract: Edge states in topological systems have attracted great interest due to their robustness and linear dispersions. Here a superconducting-proximitized edge interferometer is engineered on a topological insulator Ta2Pd3Te5 with asymmetric edges to realize the interfering Josephson diode effect (JDE), which hosts many advantages, such as the high efficiency as much as 73% at tiny applied magnetic fiel… ▽ More Edge states in topological systems have attracted great interest due to their robustness and linear dispersions. Here a superconducting-proximitized edge interferometer is engineered on a topological insulator Ta2Pd3Te5 with asymmetric edges to realize the interfering Josephson diode effect (JDE), which hosts many advantages, such as the high efficiency as much as 73% at tiny applied magnetic fields with an ultra-low switching power around picowatt, and a giant interfering magnetochiral anisotropy with a maximal coefficient gamma = 1.2 x 10^{9} T^{-1}A^{-1}. As an important element to induce such JDE, the second-order harmonic in the current-phase relation is also experimentally confirmed by half-integer Shapiro steps. This edge interferometer offers a novel and effective method to enhance the overall performance of JDE and magnetochiral anisotropy, and boosts great potential applications for future superconducting quantum devices. △ Less

Submitted 2 June, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: 29 pages,21 figures

arXiv:2306.08343 [pdf]

A Unified Probabilistic Framework for Spatiotemporal Passenger Crowdedness Inference within Urban Rail Transit Network

Authors: Min Jiang, Andi Wang, Ziyue Li, Fugee Tsung

Abstract: This paper proposes the Spatio-Temporal Crowdedness Inference Model (STCIM), a framework to infer the passenger distribution inside the whole urban rail transit (URT) system in real-time. Our model is practical since the model is designed in a probabilistic manner and only based on the entry and exit timestamps information collected by the automatic fare collection (AFC) system. Firstly, the entir… ▽ More This paper proposes the Spatio-Temporal Crowdedness Inference Model (STCIM), a framework to infer the passenger distribution inside the whole urban rail transit (URT) system in real-time. Our model is practical since the model is designed in a probabilistic manner and only based on the entry and exit timestamps information collected by the automatic fare collection (AFC) system. Firstly, the entire URT system is decomposed into several components of stations and segments. By decomposing a passenger's travel actions into entering, traveling, transferring, and exiting, we build a statistical model to estimate the passengers' lingering time within each component and the passengers' destination based on historical AFC data. Then, the passengers' spatial distribution is predicted in real-time based on each passenger's elapsed travel time and their entry station. The effectiveness of the scheme is validated with a real dataset from a real URT system. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: Accepted to IEEE CASE 2023

arXiv:2306.08339 [pdf]

Fermi Surface Evolution and Anomalous Hall Effect in an Ideal Type-II Weyl Semimetal

Authors: Qianni Jiang, Johanna C. Palmstrom, John Singleton, Shalinee Chikara, David Graf, Chong Wang, Yue Shi, Paul Malinowski, Aaron Wang, Zhong Lin, Lingnan Shen, Xiaodong Xu, Di Xiao, Jiun-Haw Chu

Abstract: Weyl semimetals (WSMs) are three-dimensional topological materials that exhibit fascinating properties due to the presence of Weyl nodes in their band structure. However, existing WSMs discovered so far often possess multiple pairs of Weyl nodes, posing a challenge in disentangling the contributions to transport phenomena from different energy bands. To overcome this challenge, we have identified… ▽ More Weyl semimetals (WSMs) are three-dimensional topological materials that exhibit fascinating properties due to the presence of Weyl nodes in their band structure. However, existing WSMs discovered so far often possess multiple pairs of Weyl nodes, posing a challenge in disentangling the contributions to transport phenomena from different energy bands. To overcome this challenge, we have identified field-induced ferromagnetic MnBi$_{2-x}$Sb$_{x}$Te$_{4}$ as an ideal type-II WSM with a single pair of Weyl nodes. By employing a combination of quantum oscillations and high-field Hall measurements, we have resolved the evolution of Fermi-surface sections as the Fermi level is tuned across the charge neutrality point, precisely matching the band structure of an ideal type-II WSM. Furthermore, the anomalous Hall conductivity exhibits a heartbeat-like behavior as the Fermi level is tuned across the Weyl nodes, a unique feature previously predicted for a type-II WSM. Our findings establish MnBi$_{2-x}$Sb$_{x}$Te$_{4}$ as an ideal platform for further investigation into Weyl physics. △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2306.08103 [pdf, other]

Generating Images with 3D Annotations Using Diffusion Models

Authors: Wufei Ma, Qihao Liu, Jiahao Wang, Angtian Wang, Xiaoding Yuan, Yi Zhang, Zihao Xiao, Guofeng Zhang, Beijia Lu, Ruxiao Duan, Yongrui Qi, Adam Kortylewski, Yaoyao Liu, Alan Yuille

Abstract: Diffusion models have emerged as a powerful generative method, capable of producing stunning photo-realistic images from natural language descriptions. However, these models lack explicit control over the 3D structure in the generated images. Consequently, this hinders our ability to obtain detailed 3D annotations for the generated images or to craft instances with specific poses and distances. In… ▽ More Diffusion models have emerged as a powerful generative method, capable of producing stunning photo-realistic images from natural language descriptions. However, these models lack explicit control over the 3D structure in the generated images. Consequently, this hinders our ability to obtain detailed 3D annotations for the generated images or to craft instances with specific poses and distances. In this paper, we propose 3D Diffusion Style Transfer (3D-DST), which incorporates 3D geometry control into diffusion models. Our method exploits ControlNet, which extends diffusion models by using visual prompts in addition to text prompts. We generate images of the 3D objects taken from 3D shape repositories (e.g., ShapeNet and Objaverse), render them from a variety of poses and viewing directions, compute the edge maps of the rendered images, and use these edge maps as visual prompts to generate realistic images. With explicit 3D geometry control, we can easily change the 3D structures of the objects in the generated images and obtain ground-truth 3D annotations automatically. This allows us to improve a wide range of vision tasks, e.g., classification and 3D pose estimation, in both in-distribution (ID) and out-of-distribution (OOD) settings. We demonstrate the effectiveness of our method through extensive experiments on ImageNet-100/200, ImageNet-R, PASCAL3D+, ObjectNet3D, and OOD-CV. The results show that our method significantly outperforms existing methods, e.g., 3.8 percentage points on ImageNet-100 using DeiT-B. △ Less

Submitted 3 April, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

Comments: ICLR 2024 Spotlight. Code: https://ccvl.jhu.edu/3D-DST/

arXiv:2306.05984 [pdf]

Noncoding RNAs evolutionarily extend animal lifespan

Authors: Anyou Wang

Abstract: The mechanisms underlying lifespan evolution in organisms have long been mysterious. However, recent studies have demonstrated that organisms evolutionarily gain noncoding RNAs (ncRNAs) that carry endogenous profound functions in higher organisms, including lifespan. This study unveils ncRNAs as crucial drivers driving animal lifespan evolution. Species in the animal kingdom evolutionarily increas… ▽ More The mechanisms underlying lifespan evolution in organisms have long been mysterious. However, recent studies have demonstrated that organisms evolutionarily gain noncoding RNAs (ncRNAs) that carry endogenous profound functions in higher organisms, including lifespan. This study unveils ncRNAs as crucial drivers driving animal lifespan evolution. Species in the animal kingdom evolutionarily increase their ncRNA length in their genomes, coinciding with trimming mitochondrial genome length. This leads to lower energy consumption and ultimately lifespan extension. Notably, during lifespan extension, species exhibit a gradual acquisition of long-life ncRNA motifs while concurrently losing short-life motifs. These longevity-associated ncRNA motifs, such as GGTGCG, are particularly active in key tissues, including the endometrium, ovary, testis, and cerebral cortex. The activation of ncRNAs in the ovary and endometrium offers insights into why women generally exhibit longer lifespans than men. This groundbreaking discovery reveals the pivotal role of ncRNAs in driving lifespan evolution and provides a fundamental foundation for the study of longevity and aging. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: 13 pages and 4 figures

arXiv:2306.03622 [pdf, other]

FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swap**

Authors: Minchen Yu, Ao Wang, Dong Chen, Haoxuan Yu, Xiaonan Luo, Zhuohao Li, Wei Wang, Ruichuan Chen, Dapeng Nie, Haoran Yang

Abstract: Serverless computing has become increasingly popular for machine learning inference. However, current serverless platforms lack efficient support for GPUs, limiting their ability to deliver low-latency inference. In this paper, we propose FaaSwap, a GPU-efficient serverless inference platform. FaaSwap employs a holistic approach to system and algorithm design. It maintains models in main memory an… ▽ More Serverless computing has become increasingly popular for machine learning inference. However, current serverless platforms lack efficient support for GPUs, limiting their ability to deliver low-latency inference. In this paper, we propose FaaSwap, a GPU-efficient serverless inference platform. FaaSwap employs a holistic approach to system and algorithm design. It maintains models in main memory and dynamically swaps them onto GPUs upon request arrivals (i.e., late binding), thereby enabling a large number of inference functions to efficiently share a node's GPUs. FaaSwap uses various techniques, including asynchronous API redirection, GPU runtime sharing, pipelined model execution, and efficient GPU memory management, to achieve the optimal performance. We also develop an interference-aware request scheduling algorithm that allows FaaSwap to meet the latency SLOs for individual inference functions. We have implemented FaaSwap as a prototype on a leading commercial serverless platform. Experimental evaluations demonstrate that, with model swap**, FaaSwap can concurrently serve hundreds of functions on a single worker node with 4 V100 GPUs, while achieving inference performance comparable to native execution (where each function runs on a dedicated GPU). When deployed on a 6-node production testbed, FaaSwap meets the latency SLOs for over 1k functions, the maximum that the testbed can handle concurrently. △ Less

Submitted 8 February, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

arXiv:2306.03511 [pdf, other]

Curriculum-Based Augmented Fourier Domain Adaptation for Robust Medical Image Segmentation

Authors: An Wang, Mobarakol Islam, Mengya Xu, Hongliang Ren

Abstract: Accurate and robust medical image segmentation is fundamental and crucial for enhancing the autonomy of computer-aided diagnosis and intervention systems. Medical data collection normally involves different scanners, protocols, and populations, making domain adaptation (DA) a highly demanding research field to alleviate model degradation in the deployment site. To preserve the model performance ac… ▽ More Accurate and robust medical image segmentation is fundamental and crucial for enhancing the autonomy of computer-aided diagnosis and intervention systems. Medical data collection normally involves different scanners, protocols, and populations, making domain adaptation (DA) a highly demanding research field to alleviate model degradation in the deployment site. To preserve the model performance across multiple testing domains, this work proposes the Curriculum-based Augmented Fourier Domain Adaptation (Curri-AFDA) for robust medical image segmentation. In particular, our curriculum learning strategy is based on the causal relationship of a model under different levels of data shift in the deployment phase, where the higher the shift is, the harder to recognize the variance. Considering this, we progressively introduce more amplitude information from the target domain to the source domain in the frequency space during the curriculum-style training to smoothly schedule the semantic knowledge transfer in an easier-to-harder manner. Besides, we incorporate the training-time chained augmentation mixing to help expand the data distributions while preserving the domain-invariant semantics, which is beneficial for the acquired model to be more robust and generalize better to unseen domains. Extensive experiments on two segmentation tasks of Retina and Nuclei collected from multiple sites and scanners suggest that our proposed method yields superior adaptation and generalization performance. Meanwhile, our approach proves to be more robust under various corruption types and increasing severity levels. In addition, we show our method is also beneficial in the domain-adaptive classification task with skin lesion datasets. The code is available at https://github.com/lofrienger/Curri-AFDA. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: Work under review. First three authors contributed equally

arXiv:2306.01453 [pdf, ps, other]

doi 10.1103/PhysRevD.107.114018

Combined analysis of the $γn \to K^0Σ^0$ and $γn \to K^+Σ^-$ reactions

Authors: Neng-Chang Wei, Ai-Chao Wang, Fei Huang

Abstract: The recently released data on differential cross sections for $γn \to K^0Σ^0$ from the A2 and BGOOD Collaborations are used to examine the theoretical model constructed in our previous work [Phys. Rev. D \textbf{105}, 094017 (2022)] for $γn \to K^+Σ^-$, and it is found that the model predictions are able to qualitatively reproduce the A2 data but fail to describe the BGOOD data. Then, a combined a… ▽ More The recently released data on differential cross sections for $γn \to K^0Σ^0$ from the A2 and BGOOD Collaborations are used to examine the theoretical model constructed in our previous work [Phys. Rev. D \textbf{105}, 094017 (2022)] for $γn \to K^+Σ^-$, and it is found that the model predictions are able to qualitatively reproduce the A2 data but fail to describe the BGOOD data. Then, a combined analysis of the $γn \to K^0Σ^0$ and $γn \to K^+Σ^-$ reactions is performed to revise the theoretical model. Due to the inconsistency problem, the A2 and BGOOD data are included in fits separately. In the case of including the A2 data, both the data for $γn \to K^0Σ^0$ and $γn \to K^+Σ^-$ can be fairly well described, and the contributions from the $N(1710)1/2^+$, $N(1880)1/2^+$, $N(1900)3/2^+$, and $Δ(1920)3/2^+$ resonances are found to dominate the reactions in the lower energy region. While in the case of including the BGOOD data, although most of the data for the $γn \to K^+ Σ^-$ reaction can be described with the exception of some noticeable discrepancies on beam asymmetries at lower energies, the BGOOD data for $γn \to K^0Σ^0$ can be only qualitatively described, and the contributions from the $N(1710)1/2^+$, $N(1900)3/2^+$, and $Δ(1910)1/2^+$ resonances are found to dominate the reactions in the lower energy region. In both cases, the $t$-channel $K^\ast(892)$ exchange is found to play a crucial role at forward angles in the higher energy region. Further precise measurements of data for $γn \to K^0Σ^0$ are called on to disentangle the discrepancies between the data sets from the A2 and BGOOD Collaborations. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: 14 pages,17 figures; Accepted for publication in Physical Review D

arXiv:2306.00451 [pdf, other]

S$^2$ME: Spatial-Spectral Mutual Teaching and Ensemble Learning for Scribble-supervised Polyp Segmentation

Authors: An Wang, Mengya Xu, Yang Zhang, Mobarakol Islam, Hongliang Ren

Abstract: Fully-supervised polyp segmentation has accomplished significant triumphs over the years in advancing the early diagnosis of colorectal cancer. However, label-efficient solutions from weak supervision like scribbles are rarely explored yet primarily meaningful and demanding in medical practice due to the expensiveness and scarcity of densely-annotated polyp data. Besides, various deployment issues… ▽ More Fully-supervised polyp segmentation has accomplished significant triumphs over the years in advancing the early diagnosis of colorectal cancer. However, label-efficient solutions from weak supervision like scribbles are rarely explored yet primarily meaningful and demanding in medical practice due to the expensiveness and scarcity of densely-annotated polyp data. Besides, various deployment issues, including data shifts and corruption, put forward further requests for model generalization and robustness. To address these concerns, we design a framework of Spatial-Spectral Dual-branch Mutual Teaching and Entropy-guided Pseudo Label Ensemble Learning (S$^2$ME). Concretely, for the first time in weakly-supervised medical image segmentation, we promote the dual-branch co-teaching framework by leveraging the intrinsic complementarity of features extracted from the spatial and spectral domains and encouraging cross-space consistency through collaborative optimization. Furthermore, to produce reliable mixed pseudo labels, which enhance the effectiveness of ensemble learning, we introduce a novel adaptive pixel-wise fusion technique based on the entropy guidance from the spatial and spectral branches. Our strategy efficiently mitigates the deleterious effects of uncertainty and noise present in pseudo labels and surpasses previous alternatives in terms of efficacy. Ultimately, we formulate a holistic optimization objective to learn from the hybrid supervision of scribbles and pseudo labels. Extensive experiments and evaluation on four public datasets demonstrate the superiority of our method regarding in-distribution accuracy, out-of-distribution generalization, and robustness, highlighting its promising clinical significance. Our code is available at https://github.com/lofrienger/S2ME. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: MICCAI 2023 Early Acceptance

arXiv:2306.00118 [pdf, other]

Neural Textured Deformable Meshes for Robust Analysis-by-Synthesis

Authors: Angtian Wang, Wufei Ma, Alan Yuille, Adam Kortylewski

Abstract: Human vision demonstrates higher robustness than current AI algorithms under out-of-distribution scenarios. It has been conjectured such robustness benefits from performing analysis-by-synthesis. Our paper formulates triple vision tasks in a consistent manner using approximate analysis-by-synthesis by render-and-compare algorithms on neural features. In this work, we introduce Neural Textured Defo… ▽ More Human vision demonstrates higher robustness than current AI algorithms under out-of-distribution scenarios. It has been conjectured such robustness benefits from performing analysis-by-synthesis. Our paper formulates triple vision tasks in a consistent manner using approximate analysis-by-synthesis by render-and-compare algorithms on neural features. In this work, we introduce Neural Textured Deformable Meshes, which involve the object model with deformable geometry that allows optimization on both camera parameters and object geometries. The deformable mesh is parameterized as a neural field, and covered by whole-surface neural texture maps, which are trained to have spatial discriminability. During inference, we extract the feature map of the test image and subsequently optimize the 3D pose and shape parameters of our model using differentiable rendering to best reconstruct the target feature map. We show that our analysis-by-synthesis is much more robust than conventional neural networks when evaluated on real-world images and even in challenging out-of-distribution scenarios, such as occlusion and domain shift. Our algorithms are competitive with standard algorithms when tested on conventional performance measures. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2305.20087 [pdf, other]

Too Large; Data Reduction for Vision-Language Pre-Training

Authors: Alex **peng Wang, Kevin Qinghong Lin, David Junhao Zhang, Stan Weixian Lei, Mike Zheng Shou

Abstract: This paper examines the problems of severe image-text misalignment and high redundancy in the widely-used large-scale Vision-Language Pre-Training (VLP) datasets. To address these issues, we propose an efficient and straightforward Vision-Language learning algorithm called TL;DR, which aims to compress the existing large VLP data into a small, high-quality set. Our approach consists of two major s… ▽ More This paper examines the problems of severe image-text misalignment and high redundancy in the widely-used large-scale Vision-Language Pre-Training (VLP) datasets. To address these issues, we propose an efficient and straightforward Vision-Language learning algorithm called TL;DR, which aims to compress the existing large VLP data into a small, high-quality set. Our approach consists of two major steps. First, a codebook-based encoder-decoder captioner is developed to select representative samples. Second, a new caption is generated to complement the original captions for selected samples, mitigating the text-image misalignment problem while maintaining uniqueness. As the result, TL;DR enables us to reduce the large dataset into a small set of high-quality data, which can serve as an alternative pre-training dataset. This algorithm significantly speeds up the time-consuming pretraining process. Specifically, TL;DR can compress the mainstream VLP datasets at a high ratio, e.g., reduce well-cleaned CC3M dataset from 2.82M to 0.67M ($\sim$24\%) and noisy YFCC15M from 15M to 2.5M ($\sim$16.7\%). Extensive experiments with three popular VLP models over seven downstream tasks show that VLP model trained on the compressed dataset provided by TL;DR can perform similar or even better results compared with training on the full-scale dataset. The code will be made available at \url{https://github.com/showlab/datacentric.vlp}. △ Less

Submitted 18 August, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: ICCV2023. Code: https://github.com/showlab/datacentric.vlp

arXiv:2305.17663 [pdf, other]

Lexical Retrieval Hypothesis in Multimodal Context

Authors: Po-Ya Angela Wang, Pin-Er Chen, Hsin-Yu Chou, Yu-Hsiang Tseng, Shu-Kai Hsieh

Abstract: Multimodal corpora have become an essential language resource for language science and grounded natural language processing (NLP) systems due to the growing need to understand and interpret human communication across various channels. In this paper, we first present our efforts in building the first Multimodal Corpus for Languages in Taiwan (MultiMoco). Based on the corpus, we conduct a case study… ▽ More Multimodal corpora have become an essential language resource for language science and grounded natural language processing (NLP) systems due to the growing need to understand and interpret human communication across various channels. In this paper, we first present our efforts in building the first Multimodal Corpus for Languages in Taiwan (MultiMoco). Based on the corpus, we conduct a case study investigating the Lexical Retrieval Hypothesis (LRH), specifically examining whether the hand gestures co-occurring with speech constants facilitate lexical retrieval or serve other discourse functions. With detailed annotations on eight parliamentary interpellations in Taiwan Mandarin, we explore the co-occurrence between speech constants and non-verbal features (i.e., head movement, face movement, hand gesture, and function of hand gesture). Our findings suggest that while hand gestures do serve as facilitators for lexical retrieval in some cases, they also serve the purpose of information emphasis. This study highlights the potential of the MultiMoco Corpus to provide an important resource for in-depth analysis and further research in multimodal communication studies. △ Less

Submitted 28 May, 2023; originally announced May 2023.

arXiv:2305.16124 [pdf, other]

Robust Category-Level 3D Pose Estimation from Synthetic Data

Authors: Jiahao Yang, Wufei Ma, Angtian Wang, Xiaoding Yuan, Alan Yuille, Adam Kortylewski

Abstract: Obtaining accurate 3D object poses is vital for numerous computer vision applications, such as 3D reconstruction and scene understanding. However, annotating real-world objects is time-consuming and challenging. While synthetically generated training data is a viable alternative, the domain shift between real and synthetic data is a significant challenge. In this work, we aim to narrow the perform… ▽ More Obtaining accurate 3D object poses is vital for numerous computer vision applications, such as 3D reconstruction and scene understanding. However, annotating real-world objects is time-consuming and challenging. While synthetically generated training data is a viable alternative, the domain shift between real and synthetic data is a significant challenge. In this work, we aim to narrow the performance gap between models trained on synthetic data and few real images and fully supervised models trained on large-scale data. We achieve this by approaching the problem from two perspectives: 1) We introduce SyntheticP3D, a new synthetic dataset for object pose estimation generated from CAD models and enhanced with a novel algorithm. 2) We propose a novel approach (CC3D) for training neural mesh models that perform pose estimation via inverse rendering. In particular, we exploit the spatial relationships between features on the mesh surface and a contrastive learning scheme to guide the domain adaptation process. Combined, these two approaches enable our models to perform competitively with state-of-the-art models using only 10% of the respective real training images, while outperforming the SOTA model by 10.4% with a threshold of pi/18 using only 50% of the real training data. Our trained model further demonstrates robust generalization to out-of-distribution scenarios despite being trained with minimal real data. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.15733 [pdf, other]

doi 10.1103/PhysRevD.108.044005

Black hole scalarizations induced by parity violations

Authors: Hao-Jie Lin, Tao Zhu, Shao-Jun Zhang, Anzhong Wang

Abstract: It is well-known that parity symmetry is broken in the weak interaction but conserved for Einstein's general relativity and Maxwell's electromagnetic theory. Nevertheless, parity symmetry could also be violated in the gravitational/electromagnetic sectors if a fundamental scalar field couples to the parity-violating gravitational/electromagnetic curvature terms. Such parity-violating terms, which… ▽ More It is well-known that parity symmetry is broken in the weak interaction but conserved for Einstein's general relativity and Maxwell's electromagnetic theory. Nevertheless, parity symmetry could also be violated in the gravitational/electromagnetic sectors if a fundamental scalar field couples to the parity-violating gravitational/electromagnetic curvature terms. Such parity-violating terms, which flip signs under reversed spatial directions, can inevitably lead to a negative effective mass squared for the scalar field perturbations near nonspherically symmetric black holes and thus are expected to trigger tachyonic instability. As illustrative examples, we show that the scalar field coupled to gravitational/electromagnetic Chern-Simons terms near a Kerr-Newmann spacetime can develop tachyonic instabilities, leading to equilibrium scalar field configurations in certain parameter regions of black holes. This instability, which is an indication of the black hole scalarization process, can occur in a broad class of nonspherically symmetric black holes and parity-violating theories. △ Less

Submitted 27 July, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: 9 pages, 3 figures, 1 table

Journal ref: Phys. Rev. D 108, 044005 (2023)

arXiv:2305.14767 [pdf]

Interpretation and visualization of distance covariance through additive decomposition of correlations formula

Authors: Andi Wang, Hao Yan, Juan Du

Abstract: Distance covariance is a widely used statistical methodology for testing the dependency between two groups of variables. Despite the appealing properties of consistency and superior testing power, the testing results of distance covariance are often hard to be interpreted. This paper presents an elementary interpretation of the mechanism of distance covariance through an additive decomposition of… ▽ More Distance covariance is a widely used statistical methodology for testing the dependency between two groups of variables. Despite the appealing properties of consistency and superior testing power, the testing results of distance covariance are often hard to be interpreted. This paper presents an elementary interpretation of the mechanism of distance covariance through an additive decomposition of correlations formula. Based on this formula, a visualization method is developed to provide practitioners with a more intuitive explanation of the distance covariance score. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.14668 [pdf, other]

Robust 3D-aware Object Classification via Discriminative Render-and-Compare

Authors: Artur Jesslen, Guofeng Zhang, Angtian Wang, Alan Yuille, Adam Kortylewski

Abstract: In real-world applications, it is essential to jointly estimate the 3D object pose and class label of objects, i.e., to perform 3D-aware classification.While current approaches for either image classification or pose estimation can be extended to 3D-aware classification, we observe that they are inherently limited: 1) Their performance is much lower compared to the respective single-task models, a… ▽ More In real-world applications, it is essential to jointly estimate the 3D object pose and class label of objects, i.e., to perform 3D-aware classification.While current approaches for either image classification or pose estimation can be extended to 3D-aware classification, we observe that they are inherently limited: 1) Their performance is much lower compared to the respective single-task models, and 2) they are not robust in out-of-distribution (OOD) scenarios. Our main contribution is a novel architecture for 3D-aware classification, which builds upon a recent work and performs comparably to single-task models while being highly robust. In our method, an object category is represented as a 3D cuboid mesh composed of feature vectors at each mesh vertex. Using differentiable rendering, we estimate the 3D object pose by minimizing the reconstruction error between the mesh and the feature representation of the target image. Object classification is then performed by comparing the reconstruction losses across object categories. Notably, the neural texture of the mesh is trained in a discriminative manner to enhance the classification performance while also avoiding local optima in the reconstruction loss. Furthermore, we show how our method and feed-forward neural networks can be combined to scale the render-and-compare approach to larger numbers of categories. Our experiments on PASCAL3D+, occluded-PASCAL3D+, and OOD-CV show that our method outperforms all baselines at 3D-aware classification by a wide margin in terms of performance and robustness. △ Less

Submitted 5 June, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.14616 [pdf, other]

Exploring Affordance and Situated Meaning in Image Captions: A Multimodal Analysis

Authors: Pin-Er Chen, Po-Ya Angela Wang, Hsin-Yu Chou, Yu-Hsiang Tseng, Shu-Kai Hsieh

Abstract: This paper explores the grounding issue regarding multimodal semantic representation from a computational cognitive-linguistic view. We annotate images from the Flickr30k dataset with five perceptual properties: Affordance, Perceptual Salience, Object Number, Gaze Cueing, and Ecological Niche Association (ENA), and examine their association with textual elements in the image captions. Our findings… ▽ More This paper explores the grounding issue regarding multimodal semantic representation from a computational cognitive-linguistic view. We annotate images from the Flickr30k dataset with five perceptual properties: Affordance, Perceptual Salience, Object Number, Gaze Cueing, and Ecological Niche Association (ENA), and examine their association with textual elements in the image captions. Our findings reveal that images with Gibsonian affordance show a higher frequency of captions containing 'holding-verbs' and 'container-nouns' compared to images displaying telic affordance. Perceptual Salience, Object Number, and ENA are also associated with the choice of linguistic expressions. Our study demonstrates that comprehensive understanding of objects or events requires cognitive attention, semantic nuances in language, and integration across multiple modalities. We highlight the vital importance of situated meaning and affordance grounding in natural language understanding, with the potential to advance human-like interpretation in various scenarios. △ Less

Submitted 24 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: 10 pages, 9 figures

arXiv:2305.13268 [pdf, other]

doi 10.1002/adfm.202302191

Spin-phonon scattering-induced low thermal conductivity in a van der Waals layered ferromagnet Cr$_2$Si$_2$Te$_6$

Authors: Kunya Yang, Hong Wu, Zefang Li, Chen Ran, Xiao Wang, Fengfeng Zhu, Xiangnan Gong, Yan Liu, Guiwen Wang, Long Zhang, Xinrun Mi, Aifeng Wang, Yisheng Chai, Yixi Su, Wenhong Wang, Mingquan He, Xiaolong Yang, Xiaoyuan Zhou

Abstract: Layered van der Waals (vdW) magnets are prominent playgrounds for develo** magnetoelectric, magneto-optic and spintronic devices. In spintronics, particularly in spincaloritronic applications, low thermal conductivity ($κ$) is highly desired. Here, by combining thermal transport measurements with density functional theory calculations, we demonstrate low $κ$ down to 1 W m$^{-1}$ K$^{-1}$ in a ty… ▽ More Layered van der Waals (vdW) magnets are prominent playgrounds for develo** magnetoelectric, magneto-optic and spintronic devices. In spintronics, particularly in spincaloritronic applications, low thermal conductivity ($κ$) is highly desired. Here, by combining thermal transport measurements with density functional theory calculations, we demonstrate low $κ$ down to 1 W m$^{-1}$ K$^{-1}$ in a typical vdW ferromagnet Cr$_2$Si$_2$Te$_6$. In the paramagnetic state, development of magnetic fluctuations way above $T_\mathrm{c}=$ 33 K strongly reduces $κ$ via spin-phonon scattering, leading to low $κ\sim$ 1 W m$^{-1}$ K$^{-1}$ over a wide temperature range, in comparable to that of amorphous silica. In the magnetically ordered state, emergence of resonant magnon-phonon scattering limits $κ$ below $\sim$ 2 W m$^{-1}$ K$^{-1}$, which would be three times larger if magnetic scatterings were absent. Application of magnetic fields strongly suppresses the spin-phonon scattering, giving rise to large enhancements of $κ$. Our calculations well capture these complex behaviours of $κ$ by taking the temperature- and magnetic-field-dependent spin-phonon scattering into account. Realization of low $κ$ which is easily tunable by magnetic fields in Cr$_2$Si$_2$Te$_6$, may further promote spincaloritronic applications of vdW magnets. Our theoretical approach may also provide a generic understanding of spin-phonon scattering, which appears to play important roles in various systems. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: 14 pages, 6 figures, accepted for publication in Advanced Functional Materials

Journal ref: Adv. Funct. Mater. 2302191 (2023)

arXiv:2305.12726 [pdf, other]

doi 10.1145/3581783.3611737

Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach

Authors: Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, **gwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

Abstract: The proliferation of in-the-wild videos has greatly expanded the Video Quality Assessment (VQA) problem. Unlike early definitions that usually focus on limited distortion types, VQA on in-the-wild videos is especially challenging as it could be affected by complicated factors, including various distortions and diverse contents. Though subjective studies have collected overall quality scores for th… ▽ More The proliferation of in-the-wild videos has greatly expanded the Video Quality Assessment (VQA) problem. Unlike early definitions that usually focus on limited distortion types, VQA on in-the-wild videos is especially challenging as it could be affected by complicated factors, including various distortions and diverse contents. Though subjective studies have collected overall quality scores for these videos, how the abstract quality scores relate with specific factors is still obscure, hindering VQA methods from more concrete quality evaluations (e.g. sharpness of a video). To solve this problem, we collect over two million opinions on 4,543 in-the-wild videos on 13 dimensions of quality-related factors, including in-capture authentic distortions (e.g. motion blur, noise, flicker), errors introduced by compression and transmission, and higher-level experiences on semantic contents and aesthetic issues (e.g. composition, camera trajectory), to establish the multi-dimensional Maxwell database. Specifically, we ask the subjects to label among a positive, a negative, and a neutral choice for each dimension. These explanation-level opinions allow us to measure the relationships between specific quality factors and abstract subjective quality ratings, and to benchmark different categories of VQA algorithms on each dimension, so as to more comprehensively analyze their strengths and weaknesses. Furthermore, we propose the MaxVQA, a language-prompted VQA approach that modifies vision-language foundation model CLIP to better capture important quality issues as observed in our analyses. The MaxVQA can jointly evaluate various specific quality factors and final quality scores with state-of-the-art accuracy on all dimensions, and superb generalization ability on existing datasets. Code and data available at https://github.com/VQAssessment/MaxVQA. △ Less

Submitted 3 August, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: Proceedings of the 31st ACM International Conference on Multimedia (MM '23)

arXiv:2305.09617 [pdf, other]

Towards Expert-Level Medical Question Answering with Large Language Models

Authors: Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Le Hou, Kevin Clark, Stephen Pfohl, Heather Cole-Lewis, Darlene Neal, Mike Schaekermann, Amy Wang, Mohamed Amin, Sami Lachgar, Philip Mansfield, Sushant Prakash, Bradley Green, Ewa Dominowska, Blaise Aguera y Arcas, Nenad Tomasev, Yun Liu, Renee Wong, Christopher Semturs, S. Sara Mahdavi, Joelle Barral , et al. (6 additional authors not shown)

Abstract: Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge. Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM w… ▽ More Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge. Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a "passing" score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach. Med-PaLM 2 scored up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19% and setting a new state-of-the-art. We also observed performance approaching or exceeding state-of-the-art across MedMCQA, PubMedQA, and MMLU clinical topics datasets. We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p < 0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p < 0.001) on newly introduced datasets of 240 long-form "adversarial" questions to probe LLM limitations. While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2305.07152 [pdf, other]

Surgical tool classification and localization: results and methods from the MICCAI 2022 SurgToolLoc challenge

Authors: Aneeq Zia, Kiran Bhattacharyya, Xi Liu, Max Berniker, Ziheng Wang, Rogerio Nespolo, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Bo Liu, David Austin, Yiheng Wang, Michal Futrega, Jean-Francois Puget, Zhenqiang Li, Yoichi Sato, Ryo Fujii, Ryo Hachiuma, Mana Masuda, Hideo Saito, An Wang, Mengya Xu, Mobarakol Islam, Long Bai, Winnie Pang , et al. (46 additional authors not shown)

Abstract: The ability to automatically detect and track surgical instruments in endoscopic videos can enable transformational interventions. Assessing surgical performance and efficiency, identifying skilled tool use and choreography, and planning operational and logistical aspects of OR resources are just a few of the applications that could benefit. Unfortunately, obtaining the annotations needed to train… ▽ More The ability to automatically detect and track surgical instruments in endoscopic videos can enable transformational interventions. Assessing surgical performance and efficiency, identifying skilled tool use and choreography, and planning operational and logistical aspects of OR resources are just a few of the applications that could benefit. Unfortunately, obtaining the annotations needed to train machine learning models to identify and localize surgical tools is a difficult task. Annotating bounding boxes frame-by-frame is tedious and time-consuming, yet large amounts of data with a wide variety of surgical tools and surgeries must be captured for robust training. Moreover, ongoing annotator training is needed to stay up to date with surgical instrument innovation. In robotic-assisted surgery, however, potentially informative data like timestamps of instrument installation and removal can be programmatically harvested. The ability to rely on tool installation data alone would significantly reduce the workload to train robust tool-tracking models. With this motivation in mind we invited the surgical data science community to participate in the challenge, SurgToolLoc 2022. The goal was to leverage tool presence data as weak labels for machine learning models trained to detect tools and localize them in video frames with bounding boxes. We present the results of this challenge along with many of the team's efforts. We conclude by discussing these results in the broader context of machine learning and surgical data science. The training data used for this challenge consisting of 24,695 video clips with tool presence labels is also being released publicly and can be accessed at https://console.cloud.google.com/storage/browser/isi-surgtoolloc-2022. △ Less

Submitted 31 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

arXiv:2305.05861 [pdf]

Template-based eukaryotic genome editing directed by SviCas3

Authors: Wang-Yu Tong, Yong Li, Shou-Dong Ye, An-**g Wang, Yan-Yan Tang, Mei-Li Li, Zhong-Fan Yu, Ting-Ting Xia, Qing-Yang Liu, Si-Qi Zhu

Abstract: RNA-guided gene editing based on the CRISPR-Cas system is currently the most effective genome editing technique. Here, we report that the SviCas3 from the subtype I-B-Svi Cas system in Streptomyces virginiae IBL14 is an RNA-guided and DNA-guided DNA endonuclease suitable for the HDR-directed gene and/or base editing of eukaryotic cell genomes. The genome editing efficiency of SviCas3 guided by DNA… ▽ More RNA-guided gene editing based on the CRISPR-Cas system is currently the most effective genome editing technique. Here, we report that the SviCas3 from the subtype I-B-Svi Cas system in Streptomyces virginiae IBL14 is an RNA-guided and DNA-guided DNA endonuclease suitable for the HDR-directed gene and/or base editing of eukaryotic cell genomes. The genome editing efficiency of SviCas3 guided by DNA is no less than that of SviCas3 guided by RNA. In particular, t-DNA, as a template and a guide, does not require a proto-spacer-adjacent motif, demonstrating that CRISPR, as the basis for crRNA design, is not required for the SviCas3-mediated gene and base editing. This discovery will broaden our understanding of enzyme diversity in CRISPR-Cas systems, will provide important tools for the creation and modification of living things and the treatment of human genetic diseases, and will usher in a new era of DNA-guided gene editing and base editing. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 113 pages, 12 figures and 4 tables

arXiv:2305.03925 [pdf, other]

Structure-Function Dynamics Hybrid Modeling: RNA Degradation

Authors: Hua Zheng, Wei Xie, Paul Whitford, Ailun Wang, Chunsheng Fang, Wandi Xu

Abstract: RNA structure and functional dynamics play fundamental roles in controlling biological systems. Molecular dynamics simulation, which can characterize interactions at an atomistic level, can advance the understanding on new drug discovery, manufacturing, and delivery mechanisms. However, it is computationally unattainable to support the development of a digital twin for enzymatic reaction network m… ▽ More RNA structure and functional dynamics play fundamental roles in controlling biological systems. Molecular dynamics simulation, which can characterize interactions at an atomistic level, can advance the understanding on new drug discovery, manufacturing, and delivery mechanisms. However, it is computationally unattainable to support the development of a digital twin for enzymatic reaction network mechanism learning, and end-to-end bioprocess design and control. Thus, we create a hybrid ("mechanistic + machine learning") model characterizing the interdependence of RNA structure and functional dynamics from atomistic to macroscopic levels. To assess the proposed modeling strategy, in this paper, we consider RNA degradation which is a critical process in cellular biology that affects gene expression. The empirical study on RNA lifetime prediction demonstrates the promising performance of the proposed multi-scale bioprocess hybrid modeling strategy. △ Less

Submitted 17 June, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

Comments: 12 pages, 5 figures

arXiv:2305.03470 [pdf, other]

doi 10.1093/mnrasl/slad051

Mildly Relativistic Motion in the Radio Quiet Quasar PG 1351+640

Authors: Ailing Wang, Tao An, Shaoguang Guo, Luis C. Ho, Willem A. Baan, Robert Braun, Sina Chen, Xiaopeng Cheng, Philippa Hartley, Jun Yang, Yingkang Zhang

Abstract: Measuring the proper motion of the emission component in radio-quiet quasars (RQQs) could help to distinguish between the origins of the radio emission and to understand whether the jet production mechanism is the same in radio-loud quasars (RLQs) and RQQs. PG 1351+640 is one of the few RQQs suitable for proper motion studies: it has two compact components on milli-arcsecond scales, a flat-spectru… ▽ More Measuring the proper motion of the emission component in radio-quiet quasars (RQQs) could help to distinguish between the origins of the radio emission and to understand whether the jet production mechanism is the same in radio-loud quasars (RLQs) and RQQs. PG 1351+640 is one of the few RQQs suitable for proper motion studies: it has two compact components on milli-arcsecond scales, a flat-spectrum core and a steep-spectrum jet; both components are >2 mJy at 5 GHz and are well suited for Very Long Baseline Array (VLBA) observations. We compare recent VLBA observations with that made seventeen years ago and find no significant change in the core-jet separation between 2005 and 2015 (a proper motion of 0.003 mas yr-1). However, the core-jet separation increased significantly between 2015 and 2022, inferring a jet proper motion velocity of 0.063 mas yr-1, which corresponds to an apparent transverse velocity of 0.37c. The result suggests that the jet of the RQQ PG 1351+640 is mildly relativistic and oriented at a relatively small viewing angle. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: The article has been published by Oxford University Press: https://academic.oup.com/mnrasl/advance-article-abstract/doi/10.1093/mnrasl/slad051/7146829?utm_source=advanceaccess&utm_campaign=mnrasl&utm_medium=email

arXiv:2305.01776 [pdf, other]

Taxonomizing and Measuring Representational Harms: A Look at Image Tagging

Authors: Jared Katzman, Angelina Wang, Morgan Scheuerman, Su Lin Blodgett, Kristen Laird, Hanna Wallach, Solon Barocas

Abstract: In this paper, we examine computational approaches for measuring the "fairness" of image tagging systems, finding that they cluster into five distinct categories, each with its own analytic foundation. We also identify a range of normative concerns that are often collapsed under the terms "unfairness," "bias," or even "discrimination" when discussing problematic cases of image tagging. Specificall… ▽ More In this paper, we examine computational approaches for measuring the "fairness" of image tagging systems, finding that they cluster into five distinct categories, each with its own analytic foundation. We also identify a range of normative concerns that are often collapsed under the terms "unfairness," "bias," or even "discrimination" when discussing problematic cases of image tagging. Specifically, we identify four types of representational harms that can be caused by image tagging systems, providing concrete examples of each. We then consider how different computational measurement approaches map to each of these types, demonstrating that there is not a one-to-one map**. Our findings emphasize that no single measurement approach will be definitive and that it is not possible to infer from the use of a particular measurement approach which type of harm was intended to be measured. Lastly, equipped with this more granular understanding of the types of representational harms that can be caused by image tagging systems, we show that attempts to mitigate some of these types of harms may be in tension with one another. △ Less

Submitted 2 May, 2023; originally announced May 2023.

Comments: AAAI-23 Special Track on AI for Social Impact

Journal ref: Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023)

arXiv:2305.01638 [pdf, other]

Sequence Modeling with Multiresolution Convolutional Memory

Authors: Jiaxin Shi, Ke Alexander Wang, Emily B. Fox

Abstract: Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural n… ▽ More Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural networks, or the parameter burden of convolutional networks with many or large filters. We instead take inspiration from wavelet-based multiresolution analysis to define a new building block for sequence modeling, which we call a MultiresLayer. The key component of our model is the multiresolution convolution, capturing multiscale trends in the input sequence. Our MultiresConv can be implemented with shared filters across a dilated causal convolution tree. Thus it garners the computational advantages of convolutional networks and the principled theoretical motivation of wavelet decompositions. Our MultiresLayer is straightforward to implement, requires significantly fewer parameters, and maintains at most a $\mathcal{O}(N\log N)$ memory footprint for a length $N$ sequence. Yet, by stacking such layers, our model yields state-of-the-art performance on a number of sequence classification and autoregressive density estimation tasks using CIFAR-10, ListOps, and PTB-XL datasets. △ Less

Submitted 1 November, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

Comments: ICML 2023, Source code: https://github.com/thjashin/multires-conv

arXiv:2305.01271 [pdf, other]

Lipid exchange promotes fusion of model protocells

Authors: Ziyan Fan, Yaam Deckel, Lauren A. Lowe, Daniel W. K. Loo, Tetsuya Yomo, Jack W. Szostak, Collin Nisler, Anna Wang

Abstract: Vesicle fusion is an important process underlying cell division, transport, and membrane trafficking. In phospholipid systems, a range of fusogens including divalent cations and depletants have been shown to induce adhesion, hemifusion, and then full content fusion between vesicles. This works shows that these fusogens do not perform the same function for fatty acid vesicles, which are used as mod… ▽ More Vesicle fusion is an important process underlying cell division, transport, and membrane trafficking. In phospholipid systems, a range of fusogens including divalent cations and depletants have been shown to induce adhesion, hemifusion, and then full content fusion between vesicles. This works shows that these fusogens do not perform the same function for fatty acid vesicles, which are used as model protocells (primitive cells). Even when fatty acid vesicles appear adhered or hemifused to each other, the intervening barriers between vesicles do not rupture. This difference is likely because fatty acids have a single aliphatic tail, and are more dynamic than their phospholipid counterparts. To address this, we postulate that fusion could instead occur under conditions, such as lipid exchange, that disrupt lipid packing. Using both experiments and molecular dynamics simulations, we verify that fusion in fatty acid systems can indeed be induced by lipid exchange. These results begin to probe how membrane biophysics could constrain the evolutionary dynamics of protocells. △ Less

Submitted 2 May, 2023; originally announced May 2023.

Comments: 15 pages, 7 figures

arXiv:2305.00510 [pdf, other]

Towards AI-Architecture Liberty: A Comprehensive Survey on Designing and Collaborating Virtual Architecture by Deep Learning in the Metaverse

Authors: Anqi Wang, Jiahua Dong, Lik-Hang Lee, Jiachuan Shen, Pan Hui

Abstract: 3D shape generation techniques leveraging deep learning have garnered significant interest from both the computer vision and architectural design communities, promising to enrich the content of the future metaverse. However, research on virtual architectural design remains limited, particularly regarding human-AI collaboration and deep learning-assisted design. We first illuminate the principles,… ▽ More 3D shape generation techniques leveraging deep learning have garnered significant interest from both the computer vision and architectural design communities, promising to enrich the content of the future metaverse. However, research on virtual architectural design remains limited, particularly regarding human-AI collaboration and deep learning-assisted design. We first illuminate the principles, generation techniques, and current literature of virtual architecture, focusing on challenges such as datasets, multimodality, design intuition, and generative frameworks. In our survey, we reviewed 187 related articles (80.7\% of articles published between 2018 and 2022) covering architectural research, virtual environments, and technical approaches. This survey investigates the latest approaches to 3D object generation with deep generative models (DGMs) and summarizes four characteristics of deep-learning generation approaches for virtual architecture. According to our analysis of the survey, we expound on four research agendas, including agency, communication, user consideration, and integrating tools, and highlight three important enablers of ubiquitous interaction with immersive systems in deep learning-assisted architectural generation. Our work contributes to fostering understanding between designers and deep learning techniques, broadening access to human-AI collaboration. We advocate for interdisciplinary efforts to address this timely research topic, facilitating content designing and generation in the metaverse. △ Less

Submitted 7 April, 2024; v1 submitted 30 April, 2023; originally announced May 2023.

Comments: 37 pages, 9 figures, and 5 tables

ACM Class: I.2.1; J.5; J.6; I.3.7

arXiv:2304.14674 [pdf, other]

SAM Meets Robotic Surgery: An Empirical Study in Robustness Perspective

Authors: An Wang, Mobarakol Islam, Mengya Xu, Yang Zhang, Hongliang Ren

Abstract: Segment Anything Model (SAM) is a foundation model for semantic segmentation and shows excellent generalization capability with the prompts. In this empirical study, we investigate the robustness and zero-shot generalizability of the SAM in the domain of robotic surgery in various settings of (i) prompted vs. unprompted; (ii) bounding box vs. points-based prompt; (iii) generalization under corrupt… ▽ More Segment Anything Model (SAM) is a foundation model for semantic segmentation and shows excellent generalization capability with the prompts. In this empirical study, we investigate the robustness and zero-shot generalizability of the SAM in the domain of robotic surgery in various settings of (i) prompted vs. unprompted; (ii) bounding box vs. points-based prompt; (iii) generalization under corruptions and perturbations with five severity levels; and (iv) state-of-the-art supervised model vs. SAM. We conduct all the observations with two well-known robotic instrument segmentation datasets of MICCAI EndoVis 2017 and 2018 challenges. Our extensive evaluation results reveal that although SAM shows remarkable zero-shot generalization ability with bounding box prompts, it struggles to segment the whole instrument with point-based prompts and unprompted settings. Furthermore, our qualitative figures demonstrate that the model either failed to predict the parts of the instrument mask (e.g., jaws, wrist) or predicted parts of the instrument as different classes in the scenario of overlap** instruments within the same bounding box or with the point-based prompt. In fact, it is unable to identify instruments in some complex surgical scenarios of blood, reflection, blur, and shade. Additionally, SAM is insufficiently robust to maintain high performance when subjected to various forms of data corruption. Therefore, we can argue that SAM is not ready for downstream surgical tasks without further domain-specific fine-tuning. △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: Work under active progress

arXiv:2304.14672 [pdf, other]

Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video Quality Assessment

Authors: Haoning Wu, Liang Liao, Annan Wang, Chaofeng Chen, **gwen Hou, Wenxiu Sun, Qiong Yan, Weisi Lin

Abstract: The proliferation of videos collected during in-the-wild natural settings has pushed the development of effective Video Quality Assessment (VQA) methodologies. Contemporary supervised opinion-driven VQA strategies predominantly hinge on training from expensive human annotations for quality scores, which limited the scale and distribution of VQA datasets and consequently led to unsatisfactory gener… ▽ More The proliferation of videos collected during in-the-wild natural settings has pushed the development of effective Video Quality Assessment (VQA) methodologies. Contemporary supervised opinion-driven VQA strategies predominantly hinge on training from expensive human annotations for quality scores, which limited the scale and distribution of VQA datasets and consequently led to unsatisfactory generalization capacity of methods driven by these data. On the other hand, although several handcrafted zero-shot quality indices do not require training from human opinions, they are unable to account for the semantics of videos, rendering them ineffective in comprehending complex authentic distortions (e.g., white balance, exposure) and assessing the quality of semantic content within videos. To address these challenges, we introduce the text-prompted Semantic Affinity Quality Index (SAQI) and its localized version (SAQI-Local) using Contrastive Language-Image Pre-training (CLIP) to ascertain the affinity between textual prompts and visual features, facilitating a comprehensive examination of semantic quality concerns without the reliance on human quality annotations. By amalgamating SAQI with existing low-level metrics, we propose the unified Blind Video Quality Index (BVQI) and its improved version, BVQI-Local, which demonstrates unprecedented performance, surpassing existing zero-shot indices by at least 24\% on all datasets. Moreover, we devise an efficient fine-tuning scheme for BVQI-Local that jointly optimizes text prompts and final fusion weights, resulting in state-of-the-art performance and superior generalization ability in comparison to prevalent opinion-driven VQA methods. We conduct comprehensive analyses to investigate different quality concerns of distinct indices, demonstrating the effectiveness and rationality of our design. △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: 13 pages, 10 figures, under review

arXiv:2304.14300 [pdf, other]

Learning Absorption Rates in Glucose-Insulin Dynamics from Meal Covariates

Authors: Ke Alexander Wang, Matthew E. Levine, Jiaxin Shi, Emily B. Fox

Abstract: Traditional models of glucose-insulin dynamics rely on heuristic parameterizations chosen to fit observations within a laboratory setting. However, these models cannot describe glucose dynamics in daily life. One source of failure is in their descriptions of glucose absorption rates after meal events. A meal's macronutritional content has nuanced effects on the absorption profile, which is difficu… ▽ More Traditional models of glucose-insulin dynamics rely on heuristic parameterizations chosen to fit observations within a laboratory setting. However, these models cannot describe glucose dynamics in daily life. One source of failure is in their descriptions of glucose absorption rates after meal events. A meal's macronutritional content has nuanced effects on the absorption profile, which is difficult to model mechanistically. In this paper, we propose to learn the effects of macronutrition content from glucose-insulin data and meal covariates. Given macronutrition information and meal times, we use a neural network to predict an individual's glucose absorption rate. We use this neural rate function as the control function in a differential equation of glucose dynamics, enabling end-to-end training. On simulated data, our approach is able to closely approximate true absorption rates, resulting in better forecast than heuristic parameterizations, despite only observing glucose, insulin, and macronutritional information. Our work readily generalizes to meal events with higher-dimensional covariates, such as images, setting the stage for glucose dynamics models that are personalized to each individual's daily life. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: Work presented at NeurIPS 2022 Workshop on Learning from Time Series for Health (TS4H). arXiv admin note: substantial text overlap with arXiv:2302.11939

arXiv:2304.14160 [pdf, other]

doi 10.1103/PhysRevD.108.024035

Periodic orbits and their gravitational wave radiations in a polymer black hole in loop quantum gravity

Authors: Ze-Yi Tu, Tao Zhu, Anzhong Wang

Abstract: This article provides a detailed investigation into the motion of the surrounding particles around a polymer black hole in loop quantum gravity (LQG). Using effective potential, the critical bound orbits and innermost stable circular orbits (ISCO) are analyzed. The study finds that the radii and angular momentum of the critical bound orbits decrease with an increase in the parameter $A_λ$ which la… ▽ More This article provides a detailed investigation into the motion of the surrounding particles around a polymer black hole in loop quantum gravity (LQG). Using effective potential, the critical bound orbits and innermost stable circular orbits (ISCO) are analyzed. The study finds that the radii and angular momentum of the critical bound orbits decrease with an increase in the parameter $A_λ$ which labels the LQG effects, while the energy and angular momentum of the ISCO also decreases with an increase in $A_λ$. Based on these findings, we then explore the periodic orbits of the polymer black hole in LQG using rational numbers composed of three integers. Our results show that the rational numbers increase with the energy of particles and decrease with the increase of angular momentum based on a classification scheme. Moreover, compared to a Schwarzschild black hole, the periodic orbits in a polymer black hole in LQG consistently have lower energy, providing a potential method for distinguishing a polymer black hole in LQG from a Schwarzschild black hole. Finally, we also examine the gravitational wave radiations of the periodic orbits of a test object which orbits a supermassive polymer black hole in LQG, which generates intricate GW waveforms that can aid in exhibiting the gravitational structure of the system. △ Less

Submitted 20 July, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: 14 pages, 10 figures, 2 tables;v2:version appeared in PRD

Journal ref: Phys Rev D 108 (2023) 2, 024035

arXiv:2304.13138 [pdf, other]

The Update-Equivalence Framework for Decision-Time Planning

Authors: Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown

Abstract: The process of revising (or constructing) a policy at execution time -- known as decision-time planning -- has been key to achieving superhuman performance in perfect-information games like chess and Go. A recent line of work has extended decision-time planning to imperfect-information games, leading to superhuman performance in poker. However, these methods involve solving subgames whose sizes gr… ▽ More The process of revising (or constructing) a policy at execution time -- known as decision-time planning -- has been key to achieving superhuman performance in perfect-information games like chess and Go. A recent line of work has extended decision-time planning to imperfect-information games, leading to superhuman performance in poker. However, these methods involve solving subgames whose sizes grow quickly in the amount of non-public information, making them unhelpful when the amount of non-public information is large. Motivated by this issue, we introduce an alternative framework for decision-time planning that is not based on solving subgames, but rather on update equivalence. In this update-equivalence framework, decision-time planning algorithms replicate the updates of last-iterate algorithms, which need not rely on public information. This facilitates scalability to games with large amounts of non-public information. Using this framework, we derive a provably sound search algorithm for fully cooperative games based on mirror descent and a search algorithm for adversarial games based on magnetic mirror descent. We validate the performance of these algorithms in cooperative and adversarial domains, notably in Hanabi, the standard benchmark for search in fully cooperative imperfect-information games. Here, our mirror descent approach exceeds or matches the performance of public information-based search while using two orders of magnitude less search time. This is the first instance of a non-public-information-based algorithm outperforming public-information-based approaches in a domain they have historically dominated. △ Less

Submitted 13 May, 2024; v1 submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.12164 [pdf, other]

USA-Net: Unified Semantic and Affordance Representations for Robot Memory

Authors: Benjamin Bolte, Austin Wang, Jimmy Yang, Mustafa Mukadam, Mrinal Kalakrishnan, Chris Paxton

Abstract: In order for robots to follow open-ended instructions like "go open the brown cabinet over the sink", they require an understanding of both the scene geometry and the semantics of their environment. Robotic systems often handle these through separate pipelines, sometimes using very different representation spaces, which can be suboptimal when the two objectives conflict. In this work, we present U… ▽ More In order for robots to follow open-ended instructions like "go open the brown cabinet over the sink", they require an understanding of both the scene geometry and the semantics of their environment. Robotic systems often handle these through separate pipelines, sometimes using very different representation spaces, which can be suboptimal when the two objectives conflict. In this work, we present USA-Net, a simple method for constructing a world representation that encodes both the semantics and spatial affordances of a scene in a differentiable map. This allows us to build a gradient-based planner which can navigate to locations in the scene specified using open-ended vocabulary. We use this planner to consistently generate trajectories which are both shorter 5-10% shorter and 10-30% closer to our goal query in CLIP embedding space than paths from comparable grid-based planners which don't leverage gradient information. To our knowledge, this is the first end-to-end differentiable planner optimizes for both semantics and affordance in a single implicit map. Code and visuals are available at our website: https://usa.bolte.cc/ △ Less

Submitted 24 April, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Showing 251–300 of 1,301 results for author: Wang, A