Search | arXiv e-print repository

TaiChi Action Capture and Performance Analysis with Multi-view RGB Cameras

Authors: Jianwei Li, Siyu Mo, Yanfei Shen

Abstract: Recent advances in computer vision and deep learning have influenced the field of sports performance analysis for researchers to track and reconstruct freely moving humans without any marker attachment. However, there are few works for vision-based motion capture and intelligent analysis for professional TaiChi movement. In this paper, we propose a framework for TaiChi performance capture and anal… ▽ More Recent advances in computer vision and deep learning have influenced the field of sports performance analysis for researchers to track and reconstruct freely moving humans without any marker attachment. However, there are few works for vision-based motion capture and intelligent analysis for professional TaiChi movement. In this paper, we propose a framework for TaiChi performance capture and analysis with multi-view geometry and artificial intelligence technology. The main innovative work is as follows: 1) A multi-camera system suitable for TaiChi motion capture is built and the multi-view TaiChi data is collected and processed; 2) A combination of traditional visual method and implicit neural radiance field is proposed to achieve sparse 3D skeleton fusion and dense 3D surface reconstruction. 3) The normalization modeling of movement sequences is carried out based on motion transfer, so as to realize TaiChi performance analysis for different groups. We have carried out evaluation experiments, and the experimental results have shown the efficiency of our method. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2305.19458 [pdf, other]

A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition

Authors: Shentong Mo, Pedro Morgado

Abstract: The ability to accurately recognize, localize and separate sound sources is fundamental to any audio-visual perception task. Historically, these abilities were tackled separately, with several methods developed independently for each task. However, given the interconnected nature of source localization, separation, and recognition, independent models are likely to yield suboptimal performance as t… ▽ More The ability to accurately recognize, localize and separate sound sources is fundamental to any audio-visual perception task. Historically, these abilities were tackled separately, with several methods developed independently for each task. However, given the interconnected nature of source localization, separation, and recognition, independent models are likely to yield suboptimal performance as they fail to capture the interdependence between these tasks. To address this problem, we propose a unified audio-visual learning framework (dubbed OneAVM) that integrates audio and visual cues for joint localization, separation, and recognition. OneAVM comprises a shared audio-visual encoder and task-specific decoders trained with three objectives. The first objective aligns audio and visual representations through a localized audio-visual correspondence loss. The second tackles visual source separation using a traditional mix-and-separate framework. Finally, the third objective reinforces visual feature separation and localization by mixing images in pixel space and aligning their representations with those of all corresponding sound sources. Extensive experiments on MUSIC, VGG-Instruments, VGG-Music, and VGGSound datasets demonstrate the effectiveness of OneAVM for all three tasks, audio-visual source localization, separation, and nearest neighbor recognition, and empirically demonstrate a strong positive transfer between them. △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.14095 [pdf, other]

S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist Captions

Authors: Sangwoo Mo, Minkyu Kim, Kyungmin Lee, **woo Shin

Abstract: Vision-language models, such as contrastive language-image pre-training (CLIP), have demonstrated impressive results in natural image domains. However, these models often struggle when applied to specialized domains like remote sensing, and adapting to such domains is challenging due to the limited number of image-text pairs available for training. To address this, we propose S-CLIP, a semi-superv… ▽ More Vision-language models, such as contrastive language-image pre-training (CLIP), have demonstrated impressive results in natural image domains. However, these models often struggle when applied to specialized domains like remote sensing, and adapting to such domains is challenging due to the limited number of image-text pairs available for training. To address this, we propose S-CLIP, a semi-supervised learning method for training CLIP that utilizes additional unpaired images. S-CLIP employs two pseudo-labeling strategies specifically designed for contrastive learning and the language modality. The caption-level pseudo-label is given by a combination of captions of paired images, obtained by solving an optimal transport problem between unpaired and paired images. The keyword-level pseudo-label is given by a keyword in the caption of the nearest paired image, trained through partial label learning that assumes a candidate set of labels for supervision instead of the exact one. By combining these objectives, S-CLIP significantly enhances the training of CLIP using only a few image-text pairs, as demonstrated in various specialist domains, including remote sensing, fashion, scientific figures, and comics. For instance, S-CLIP improves CLIP by 10% for zero-shot classification and 4% for image-text retrieval on the remote sensing benchmark, matching the performance of supervised CLIP while using three times fewer image-text pairs. △ Less

Submitted 25 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023

arXiv:2305.12903 [pdf, other]

DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment

Authors: Shentong Mo, **g Shi, Yapeng Tian

Abstract: Text-to-audio (TTA) generation is a recent popular problem that aims to synthesize general audio given text descriptions. Previous methods utilized latent diffusion models to learn audio embedding in a latent space with text embedding as the condition. However, they ignored the synchronization between audio and visual content in the video, and tended to generate audio mismatching from video frames… ▽ More Text-to-audio (TTA) generation is a recent popular problem that aims to synthesize general audio given text descriptions. Previous methods utilized latent diffusion models to learn audio embedding in a latent space with text embedding as the condition. However, they ignored the synchronization between audio and visual content in the video, and tended to generate audio mismatching from video frames. In this work, we propose a novel and personalized text-to-sound generation approach with visual alignment based on latent diffusion models, namely DiffAVA, that can simply fine-tune lightweight visual-text alignment modules with frozen modality-specific encoders to update visual-aligned text embeddings as the condition. Specifically, our DiffAVA leverages a multi-head attention transformer to aggregate temporal information from video features, and a dual multi-modal residual network to fuse temporal visual representations with text embeddings. Then, a contrastive learning objective is applied to match visual-aligned text embeddings with audio features. Experimental results on the AudioCaps dataset demonstrate that the proposed DiffAVA can achieve competitive performance on visual-aligned text-to-audio generation. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.01836 [pdf, other]

AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation

Authors: Shentong Mo, Yapeng Tian

Abstract: Segment Anything Model (SAM) has recently shown its powerful effectiveness in visual segmentation tasks. However, there is less exploration concerning how SAM works on audio-visual tasks, such as visual sound localization and segmentation. In this work, we propose a simple yet effective audio-visual localization and segmentation framework based on the Segment Anything Model, namely AV-SAM, that ca… ▽ More Segment Anything Model (SAM) has recently shown its powerful effectiveness in visual segmentation tasks. However, there is less exploration concerning how SAM works on audio-visual tasks, such as visual sound localization and segmentation. In this work, we propose a simple yet effective audio-visual localization and segmentation framework based on the Segment Anything Model, namely AV-SAM, that can generate sounding object masks corresponding to the audio. Specifically, our AV-SAM simply leverages pixel-wise audio-visual fusion across audio features and visual features from the pre-trained image encoder in SAM to aggregate cross-modal representations. Then, the aggregated cross-modal features are fed into the prompt encoder and mask decoder to generate the final audio-visual segmentation masks. We conduct extensive experiments on Flickr-SoundNet and AVSBench datasets. The results demonstrate that the proposed AV-SAM can achieve competitive performance on sounding object localization and segmentation. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2304.04399 [pdf, other]

CAVL: Learning Contrastive and Adaptive Representations of Vision and Language

Authors: Shentong Mo, **gfei Xia, Ihor Markevych

Abstract: Visual and linguistic pre-training aims to learn vision and language representations together, which can be transferred to visual-linguistic downstream tasks. However, there exists semantic confusion between language and vision during the pre-training stage. Moreover, current pre-trained models tend to take lots of computation resources for fine-tuning when transferred to downstream tasks. In this… ▽ More Visual and linguistic pre-training aims to learn vision and language representations together, which can be transferred to visual-linguistic downstream tasks. However, there exists semantic confusion between language and vision during the pre-training stage. Moreover, current pre-trained models tend to take lots of computation resources for fine-tuning when transferred to downstream tasks. In this work, we present a simple but effective approach for learning Contrastive and Adaptive representations of Vision and Language, namely CAVL. Specifically, we introduce a pair-wise contrastive loss to learn alignments between the whole sentence and each image in the same batch during the pre-training process. At the fine-tuning stage, we introduce two lightweight adaptation networks to reduce model parameters and increase training speed for saving computation resources. We evaluate our CAVL on six main downstream tasks, including Visual Question Answering (VQA), Visual Commonsense Reasoning (VCR), Natural Language for Visual Reasoning (NLVR), Region-to-Phrase Grounding (RPG), Text-to-Image Retrieval (TIR), and Zero-shot Text-to-Image Retrieval (ZS-TIR). Compared to baselines, we achieve superior performance and reduce the fine-tuning time by a large margin (in particular, 76.17%). Extensive experiments and ablation studies demonstrate the efficiency of contrastive pre-training and adaptive fine-tuning proposed in our CAVL. △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:2304.00425 [pdf, other]

doi 10.1103/PhysRevB.108.155121

Charge order induced Dirac pockets in the nonsymmorphic crystal TaTe$_4$

Authors: Yichen Zhang, Ruixiang Zhou, Hanlin Wu, Ji Seop Oh, Sheng Li, Jianwei Huang, Jonathan D. Denlinger, Makoto Hashimoto, Donghui Lu, Sung-Kwan Mo, Kevin F. Kelly, Gregory T. McCandless, Julia Y. Chan, Robert J. Birgeneau, Bing Lv, Gang Li, Ming Yi

Abstract: The interplay between charge order (CO) and nontrivial band topology has spurred tremendous interest in understanding topological excitations beyond the single-particle description. In a quasi-one-dimensional nonsymmorphic crystal TaTe$_4$, the (2a$\times$2b$\times$3c) charge ordered ground state drives the system into a space group where the symmetry indicator features the emergence of Dirac ferm… ▽ More The interplay between charge order (CO) and nontrivial band topology has spurred tremendous interest in understanding topological excitations beyond the single-particle description. In a quasi-one-dimensional nonsymmorphic crystal TaTe$_4$, the (2a$\times$2b$\times$3c) charge ordered ground state drives the system into a space group where the symmetry indicator features the emergence of Dirac fermions and unconventional double Dirac fermions. Using angle-resolved photoemission spectroscopy and first-principles calculations, we provide evidence of the CO induced Dirac fermion-related bands near the Fermi level. Furthermore, the band folding at the Fermi level is compatible with the new periodicity dictated by the CO, indicating that the electrons near the Fermi level follow the crystalline symmetries needed to host double Dirac fermions in this system. △ Less

Submitted 25 March, 2024; v1 submitted 1 April, 2023; originally announced April 2023.

Comments: 9 pages, 4 figures. The authorship of this paper has been amended to include new coauthors Dr. Gregory T. McCandless and Dr. Julia Y. Chan of Department of Chemistry and Biochemistry, Baylor University. Drs. McCandless and Chan were responsible for x-ray characterization of the sample used in this study. Erratum to be published on Phys. Rev. B

Journal ref: Phys. Rev. B. 108, 155121 (2023)

arXiv:2303.17056 [pdf, other]

Audio-Visual Grou** Network for Sound Localization from Mixtures

Authors: Shentong Mo, Yapeng Tian

Abstract: Sound source localization is a typical and challenging task that predicts the location of sound sources in a video. Previous single-source methods mainly used the audio-visual association as clues to localize sounding objects in each image. Due to the mixed property of multiple sound sources in the original space, there exist rare multi-source approaches to localizing multiple sources simultaneous… ▽ More Sound source localization is a typical and challenging task that predicts the location of sound sources in a video. Previous single-source methods mainly used the audio-visual association as clues to localize sounding objects in each image. Due to the mixed property of multiple sound sources in the original space, there exist rare multi-source approaches to localizing multiple sources simultaneously, except for one recent work using a contrastive random walk in the graph with images and separated sound as nodes. Despite their promising performance, they can only handle a fixed number of sources, and they cannot learn compact class-aware representations for individual sources. To alleviate this shortcoming, in this paper, we propose a novel audio-visual grou** network, namely AVGN, that can directly learn category-wise semantic features for each source from the input audio mixture and image to localize multiple sources simultaneously. Specifically, our AVGN leverages learnable audio-visual class tokens to aggregate class-aware source features. Then, the aggregated semantic features for each source can be used as guidance to localize the corresponding visual regions. Compared to existing multi-source methods, our new framework can localize a flexible number of sources and disentangle category-aware audio-visual representations for individual sound sources. We conduct extensive experiments on MUSIC, VGGSound-Instruments, and VGG-Sound Sources benchmarks. The results demonstrate that the proposed AVGN can achieve state-of-the-art sounding object localization performance on both single-source and multi-source scenarios. Code is available at \url{https://github.com/stoneMo/AVGN}. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: CVPR 2023

arXiv:2303.12959 [pdf, other]

Variantional autoencoder with decremental information bottleneck for disentanglement

Authors: Jiantao Wu, Shentong Mo, Xiang Yang, Muhammad Awais, Sara Atito, Xingshen Zhang, Lin Wang, Xiang Yang

Abstract: One major challenge of disentanglement learning with variational autoencoders is the trade-off between disentanglement and reconstruction fidelity. Previous studies, which increase the information bottleneck during training, tend to lose the constraint of disentanglement, leading to the information diffusion problem. In this paper, we present a novel framework for disentangled representation learn… ▽ More One major challenge of disentanglement learning with variational autoencoders is the trade-off between disentanglement and reconstruction fidelity. Previous studies, which increase the information bottleneck during training, tend to lose the constraint of disentanglement, leading to the information diffusion problem. In this paper, we present a novel framework for disentangled representation learning, DeVAE, which utilizes hierarchical latent spaces with decreasing information bottlenecks across these spaces. The key innovation of our approach lies in connecting the hierarchical latent spaces through disentanglement-invariant transformations, allowing the sharing of disentanglement properties among spaces while maintaining an acceptable level of reconstruction performance. We demonstrate the effectiveness of DeVAE in achieving a balance between disentanglement and reconstruction through a series of experiments and ablation studies on dSprites and Shapes3D datasets. Code is available at https://github.com/erow/disentanglement_lib/tree/pytorch#devae. △ Less

Submitted 4 October, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.11622 [pdf]

doi 10.1073/pnas.2204630119

Differentiated roles of Lifshitz transition on thermodynamics and superconductivity in La2-xSrxCuO4

Authors: Yong Zhong, Zhuoyu Chen, Su-Di Chen, Ke-Jun Xu, Makoto Hashimoto, Yu He, Shin-ichi Uchida, Donghui Lu, Sung-Kwan Mo, Zhi-Xun Shen

Abstract: The effect of Lifshitz transition on thermodynamics and superconductivity in hole-doped cuprates has been heavily debated but remains an open question. In particular, an observed peak of electronic specific heat is proposed to originate from fluctuations of a putative quantum critical point p* (e.g. the termination of pseudogap at zero temperature), which is close to, but distinguishable from the… ▽ More The effect of Lifshitz transition on thermodynamics and superconductivity in hole-doped cuprates has been heavily debated but remains an open question. In particular, an observed peak of electronic specific heat is proposed to originate from fluctuations of a putative quantum critical point p* (e.g. the termination of pseudogap at zero temperature), which is close to, but distinguishable from the Lifshitz transition in La-based cuprates. Here, we report an in situ angle-resolved photoemission spectroscopy study of three-dimensional Fermi surfaces in La2-xSrxCuO4 thin films(x = 0.06 - 0.35). With accurate kz dispersion quantification, the Lifshitz transition is determined to happen within a finite range around x = 0.21. Normal state electronic specific heat, calculated from spectroscopy-derived band parameters, agrees with previous thermodynamic microcalorimetry measurements. The account of the specific heat maximum by underlying band structures excludes the need for additionally dominant contribution from the quantum fluctuations at p*. A d-wave superconducting gap smoothly across the Lifshitz transition demonstrates the insensitivity of superconductivity to the dramatic density of states enhancement. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Journal ref: Proc. Natl. Acad. Sci. U.S.A. 119, e2204630119 (2022)

arXiv:2303.04549 [pdf]

doi 10.1038/s41586-024-07023-w

Observation of plaid-like spin splitting in a noncoplanar antiferromagnet

Authors: Yu-Peng Zhu, Xiaobing Chen, Xiang-Rui Liu, Yuntian Liu, Pengfei Liu, Heming Zha, Gexing Qu, Caiyun Hong, Jiayu Li, Zhicheng Jiang, Xiao-Ming Ma, Yu-Jie Hao, Ming-Yuan Zhu, Wen**g Liu, Meng Zeng, Sreehari Jayaram, Malik Lenger, Jianyang Ding, Shu Mo, Kiyohisa Tanaka, Masashi Arita, Zhengtai Liu, Mao Ye, Dawei Shen, Jörg Wrachtrup , et al. (5 additional authors not shown)

Abstract: Spatial, momentum and energy separation of electronic spins in condensed matter systems guides the development of novel devices where spin-polarized current is generated and manipulated. Recent attention on a set of previously overlooked symmetry operations in magnetic materials leads to the emergence of a new type of spin splitting, enabling giant and momentum-dependent spin polarization of energ… ▽ More Spatial, momentum and energy separation of electronic spins in condensed matter systems guides the development of novel devices where spin-polarized current is generated and manipulated. Recent attention on a set of previously overlooked symmetry operations in magnetic materials leads to the emergence of a new type of spin splitting, enabling giant and momentum-dependent spin polarization of energy bands on selected antiferromagnets. Despite the ever-growing theoretical predictions, the direct spectroscopic proof of such spin splitting is still lacking. Here, we provide solid spectroscopic and computational evidence for the existence of such materials. In the noncoplanar antiferromagnet MnTe$_2$, the in-plane components of spin are found to be antisymmetric about the high-symmetry planes of the Brillouin zone, comprising a plaid-like spin texture in the antiferromagnetic (AFM) ground state. Such an unconventional spin pattern, further found to diminish at the high-temperature paramagnetic state, stems from the intrinsic AFM order instead of spin-orbit coupling (SOC). Our finding demonstrates a new type of quadratic spin texture induced by time-reversal breaking, placing AFM spintronics on a firm basis and paving the way for studying exotic quantum phenomena in related materials. △ Less

Submitted 4 January, 2024; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: Version 3, 49 pages, 4 main figures, 13 extended data figures and 2 extended data tables. Nature in press (2024)

Journal ref: Nature 626, 523-528 (2024)

arXiv:2303.02848 [pdf]

doi 10.1103/PhysRevB.107.115141

Thermal hysteretic behavior and negative magnetoresistance in an unusual charge-density-wave material EuTe4

Authors: Q. Q. Zhang, Y. Shi, K. Y. Zhai, W. X. Zhao, X. Du, J. S. Zhou, X. Gu, R. Z. Xu, Y. D. Li, Y. F. Guo, Z. K. Liu, C. Chen, S. -K. Mo, T. K. Kim, C. Cacho, J. W. Yu, W. Li, Y. L. Chen, Jiun-Haw Chu, L. X. Yang

Abstract: EuTe4 is a newly-discovered van der Waals material exhibiting a novel charge-density wave (CDW) with a large thermal hysteresis in the resistivity and CDW gap. In this work, we systematically study the electronic structure and transport properties of EuTe4 using high-resolution angle-resolved photoemission spectroscopy (ARPES), magnetoresistance measurements, and scanning tunneling microscopy (STM… ▽ More EuTe4 is a newly-discovered van der Waals material exhibiting a novel charge-density wave (CDW) with a large thermal hysteresis in the resistivity and CDW gap. In this work, we systematically study the electronic structure and transport properties of EuTe4 using high-resolution angle-resolved photoemission spectroscopy (ARPES), magnetoresistance measurements, and scanning tunneling microscopy (STM). We observe a CDW gap of about 200 meV at low temperatures that persists up to 400 K, suggesting that the CDW transition occurs at a much higher temperature. We observe a large thermal hysteretic behavior of the ARPES intensity near the Fermi level, consistent with the resistivity measurement. The hysteresis in the resistivity measurement does not change under a magnetic field up to 7 T, excluding the thermal magnetic hysteresis mechanism. Instead, the surface topography measured with STM shows surface domains with different CDW trimerization directions, which may be important for the thermal hysteretic behavior of EuTe4. Interestingly, we observe a large negative magnetoresistance at low temperatures that can be associated with the canting of magnetically ordered Eu spins. Our work shed light on the understanding of magnetic, transport, and electronic properties of EuTe4. △ Less

Submitted 5 March, 2023; originally announced March 2023.

arXiv:2302.14483 [pdf, other]

RoPAWS: Robust Semi-supervised Representation Learning from Uncurated Data

Authors: Sangwoo Mo, Jong-Chyi Su, Chih-Yao Ma, Mido Assran, Ishan Misra, Licheng Yu, Sean Bell

Abstract: Semi-supervised learning aims to train a model using limited labels. State-of-the-art semi-supervised methods for image classification such as PAWS rely on self-supervised representations learned with large-scale unlabeled but curated data. However, PAWS is often less effective when using real-world unlabeled data that is uncurated, e.g., contains out-of-class data. We propose RoPAWS, a robust ext… ▽ More Semi-supervised learning aims to train a model using limited labels. State-of-the-art semi-supervised methods for image classification such as PAWS rely on self-supervised representations learned with large-scale unlabeled but curated data. However, PAWS is often less effective when using real-world unlabeled data that is uncurated, e.g., contains out-of-class data. We propose RoPAWS, a robust extension of PAWS that can work with real-world unlabeled data. We first reinterpret PAWS as a generative classifier that models densities using kernel density estimation. From this probabilistic perspective, we calibrate its prediction based on the densities of labeled and unlabeled data, which leads to a simple closed-form solution from the Bayes' rule. We demonstrate that RoPAWS significantly improves PAWS for uncurated Semi-iNat by +5.3% and curated ImageNet by +0.4%. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: ICLR 2023

arXiv:2302.10506 [pdf, other]

Diffusion Probabilistic Models for Structured Node Classification

Authors: Hyosoon Jang, Seonghyun Park, Sangwoo Mo, Sungsoo Ahn

Abstract: This paper studies structured node classification on graphs, where the predictions should consider dependencies between the node labels. In particular, we focus on solving the problem for partially labeled graphs where it is essential to incorporate the information in the known label for predicting the unknown labels. To address this issue, we propose a novel framework leveraging the diffusion pro… ▽ More This paper studies structured node classification on graphs, where the predictions should consider dependencies between the node labels. In particular, we focus on solving the problem for partially labeled graphs where it is essential to incorporate the information in the known label for predicting the unknown labels. To address this issue, we propose a novel framework leveraging the diffusion probabilistic model for structured node classification (DPM-SNC). At the heart of our framework is the extraordinary capability of DPM-SNC to (a) learn a joint distribution over the labels with an expressive reverse diffusion process and (b) make predictions conditioned on the known labels utilizing manifold-constrained sampling. Since the DPMs lack training algorithms for partially labeled data, we design a novel training algorithm to apply DPMs, maximizing a new variational lower bound. We also theoretically analyze how DPMs benefit node classification by enhancing the expressive power of GNNs based on proposing AGG-WL, which is strictly more powerful than the classic 1-WL test. We extensively verify the superiority of our DPM-SNC in diverse scenarios, which include not only the transductive setting on partially labeled graphs but also the inductive setting and unlabeled graphs. △ Less

Submitted 18 June, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

arXiv:2301.11104 [pdf, other]

Discovering and Mitigating Visual Biases through Keyword Explanation

Authors: Younghyun Kim, Sangwoo Mo, Minkyu Kim, Kyungmin Lee, Jaeho Lee, **woo Shin

Abstract: Addressing biases in computer vision models is crucial for real-world AI deployments. However, mitigating visual biases is challenging due to their unexplainable nature, often identified indirectly through visualization or sample statistics, which necessitates additional human supervision for interpretation. To tackle this issue, we propose the Bias-to-Text (B2T) framework, which interprets visual… ▽ More Addressing biases in computer vision models is crucial for real-world AI deployments. However, mitigating visual biases is challenging due to their unexplainable nature, often identified indirectly through visualization or sample statistics, which necessitates additional human supervision for interpretation. To tackle this issue, we propose the Bias-to-Text (B2T) framework, which interprets visual biases as keywords. Specifically, we extract common keywords from the captions of mispredicted images to identify potential biases in the model. We then validate these keywords by measuring their similarity to the mispredicted images using a vision-language scoring model. The keyword explanation form of visual bias offers several advantages, such as a clear group naming for bias discovery and a natural extension for debiasing using these group names. Our experiments demonstrate that B2T can identify known biases, such as gender bias in CelebA, background bias in Waterbirds, and distribution shifts in ImageNet-R/C. Additionally, B2T uncovers novel biases in larger datasets, such as Dollar Street and ImageNet. For example, we discovered a contextual bias between "bee" and "flower" in ImageNet. We also highlight various applications of B2T keywords, including debiased training, CLIP prompting, and model comparison. △ Less

Submitted 26 March, 2024; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: CVPR 2024. First two authors contributed equally

arXiv:2301.07926 [pdf, ps, other]

Normalized Solutions to Kirchhoff Equation with Nonnegative Potential

Authors: Shuai Mo, Shiwang Ma

Abstract: This paper is concerned with the existence of solutions to the problem $$-\left(a+ b\int_{\mathbb{R}^{N}}|\nabla u|^{2} dx \right)Δu +V(x)u+λu = |u|^{p-2}u,\ \ x \in \mathbb{R}^{N},\ \ λ\in \mathbb{R}^{+} $$ where $a, b>0$ are constants, $ V \geq 0$ is a potential, $N \geq 1 $, and $ p \in (2+ \frac{4}{N},2^*$). We use a more subtle analysis to revisit the limited problem($V \equiv 0$), and ob… ▽ More This paper is concerned with the existence of solutions to the problem $$-\left(a+ b\int_{\mathbb{R}^{N}}|\nabla u|^{2} dx \right)Δu +V(x)u+λu = |u|^{p-2}u,\ \ x \in \mathbb{R}^{N},\ \ λ\in \mathbb{R}^{+} $$ where $a, b>0$ are constants, $ V \geq 0$ is a potential, $N \geq 1 $, and $ p \in (2+ \frac{4}{N},2^*$). We use a more subtle analysis to revisit the limited problem($V \equiv 0$), and obtain a new energy inequality and bifurcation results. Based on these observations, we establish the existence of bound state normalized solutions under different assumptions on $V$. These conclusions extend some known results in previous papers. △ Less

Submitted 19 January, 2023; originally announced January 2023.

arXiv:2301.06667 [pdf]

doi 10.1002/adma.202312004

Imaging the breakdown and restoration of topological protection in magnetic topological insulator MnBi$_2$Te$_4$

Authors: Qile Li, Iolanda Di Bernardo, Johnathon Maniatis, Daniel McEwen, Liam Watson, Benjamin Lowe, Thi-Hai-Yen Vu, Chi Xuan Trang, **woong Hwang, Sung-Kwan Mo, Michael S. Fuhrer, Mark T. Edmonds

Abstract: Quantum anomalous Hall (QAH) insulators transport charge without resistance along topologically protected chiral one-dimensional edge states. Yet, in magnetic topological insulators (MTI) to date, topological protection is far from robust, with the zero-magnetic field QAH effect only realised at temperatures an order of magnitude below the Néel temperature TN, though small magnetic fields can stab… ▽ More Quantum anomalous Hall (QAH) insulators transport charge without resistance along topologically protected chiral one-dimensional edge states. Yet, in magnetic topological insulators (MTI) to date, topological protection is far from robust, with the zero-magnetic field QAH effect only realised at temperatures an order of magnitude below the Néel temperature TN, though small magnetic fields can stabilize QAH effect. Understanding why topological protection breaks down is therefore essential to realising QAH effect at higher temperatures. Here we use a scanning tunnelling microscope to directly map the size of the exchange gap (Eg,ex) and its spatial fluctuation in the QAH insulator 5-layer MnBi$_2$Te$_4$. We observe long-range fluctuations of Eg,ex with values ranging between 0 (gapless) and 70 meV, uncorrelated to individual point defects. We directly image the breakdown of topological protection, showing that the chiral edge state, the hallmark signature of a QAH insulator, hybridizes with extended gapless metallic regions in the bulk. Finally, we unambiguously demonstrate that the gapless regions originate in magnetic disorder, by demonstrating that a small magnetic field restores Eg,ex in these regions, explaining the recovery of topological protection in magnetic fields. Our results indicate that overcoming magnetic disorder is key to exploiting the unique properties of QAH insulators. △ Less

Submitted 16 January, 2023; originally announced January 2023.

arXiv:2212.06595 [pdf, other]

OAMixer: Object-aware Mixing Layer for Vision Transformers

Authors: Hyunwoo Kang, Sangwoo Mo, **woo Shin

Abstract: Patch-based models, e.g., Vision Transformers (ViTs) and Mixers, have shown impressive results on various visual recognition tasks, alternating classic convolutional networks. While the initial patch-based models (ViTs) treated all patches equally, recent studies reveal that incorporating inductive bias like spatiality benefits the representations. However, most prior works solely focused on the l… ▽ More Patch-based models, e.g., Vision Transformers (ViTs) and Mixers, have shown impressive results on various visual recognition tasks, alternating classic convolutional networks. While the initial patch-based models (ViTs) treated all patches equally, recent studies reveal that incorporating inductive bias like spatiality benefits the representations. However, most prior works solely focused on the location of patches, overlooking the scene structure of images. Thus, we aim to further guide the interaction of patches using the object information. Specifically, we propose OAMixer (object-aware mixing layer), which calibrates the patch mixing layers of patch-based models based on the object labels. Here, we obtain the object labels in unsupervised or weakly-supervised manners, i.e., no additional human-annotating cost is necessary. Using the object labels, OAMixer computes a reweighting mask with a learnable scale parameter that intensifies the interaction of patches containing similar objects and applies the mask to the patch mixing layers. By learning an object-centric representation, we demonstrate that OAMixer improves the classification accuracy and background robustness of various patch-based models, including ViTs, MLP-Mixers, and ConvMixers. Moreover, we show that OAMixer enhances various downstream tasks, including large-scale classification, self-supervised learning, and multi-object recognition, verifying the generic applicability of OAMixer △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: CVPR Transformers for Vision Workshop 2022. First two authors contributed equally

arXiv:2212.02090 [pdf, other]

Breaking the Spurious Causality of Conditional Generation via Fairness Intervention with Corrective Sampling

Authors: Junhyun Nam, Sangwoo Mo, Jaeho Lee, **woo Shin

Abstract: To capture the relationship between samples and labels, conditional generative models often inherit spurious correlations from the training dataset. This can result in label-conditional distributions that are imbalanced with respect to another latent attribute. To mitigate this issue, which we call spurious causality of conditional generation, we propose a general two-step strategy. (a) Fairness I… ▽ More To capture the relationship between samples and labels, conditional generative models often inherit spurious correlations from the training dataset. This can result in label-conditional distributions that are imbalanced with respect to another latent attribute. To mitigate this issue, which we call spurious causality of conditional generation, we propose a general two-step strategy. (a) Fairness Intervention (FI): emphasize the minority samples that are hard to generate due to the spurious correlation in the training dataset. (b) Corrective Sampling (CS): explicitly filter the generated samples and ensure that they follow the desired latent attribute distribution. We have designed the fairness intervention to work for various degrees of supervision on the spurious attribute, including unsupervised, weakly-supervised, and semi-supervised scenarios. Our experimental results demonstrate that FICS can effectively resolve spurious causality of conditional generation across various datasets. △ Less

Submitted 4 July, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

Comments: TMLR 2023

arXiv:2211.16690 [pdf]

doi 10.1103/PhysRevB.107.195114

Electronic Origin of Half-metal to Semiconductor Transition and Colossal Magnetoresistance in Spinel HgCr2Se4

Authors: Aiji Liang, Zhilin Li, Shihao Zhang, Shucui Sun, Shuai Liu, Cheng Chen, Haifeng Yang, Shengtao Cui, Sung-Kwan Mo, Shuai Yang, Yongqing Li, Meixiao Wang, Lexian Yang, Jianpeng Liu, Zhongkai Liu, Yulin Chen

Abstract: Half-metals are ferromagnets hosting spin-polarized conducting carriers and crucial for spintronics applications. The chromium spinel HgCr2Se4 represents a unique type of half-metal, which features a half-metal to semiconductor transition (HMST) and exhibits colossal magnetoresistance (CMR) across the ferromagnetic-paramagnetic (FM-PM) transition. Using angle-resolved photoemission spectroscopy (A… ▽ More Half-metals are ferromagnets hosting spin-polarized conducting carriers and crucial for spintronics applications. The chromium spinel HgCr2Se4 represents a unique type of half-metal, which features a half-metal to semiconductor transition (HMST) and exhibits colossal magnetoresistance (CMR) across the ferromagnetic-paramagnetic (FM-PM) transition. Using angle-resolved photoemission spectroscopy (ARPES), we find that the Fermi surface of n-type HgCr2Se4 (n-HgCr2Se4) consists of a single electron pocket which moves above the Fermi level (EF) upon the FM-PM transition, leading to the HMST. Such a Lifshitz transition manifests a giant band splitting which originates from the exchange interaction unveiled with a specific chemical nonstoichiometry. The exchange band splitting and the chemical nonstoichiometry are two key ingredients to the HMST and CMR, consistent with our ab-initio calculation. Our findings provide spectroscopic evidences of the electronic origin of the anomalous properties of HgCr2Se4, which address the unique phase transition in half-metals. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.09074 [pdf, other]

Where a Strong Backbone Meets Strong Features -- ActionFormer for Ego4D Moment Queries Challenge

Authors: Fangzhou Mu, Sicheng Mo, Gillian Wang, Yin Li

Abstract: This report describes our submission to the Ego4D Moment Queries Challenge 2022. Our submission builds on ActionFormer, the state-of-the-art backbone for temporal action localization, and a trio of strong video features from SlowFast, Omnivore and EgoVLP. Our solution is ranked 2nd on the public leaderboard with 21.76% average mAP on the test set, which is nearly three times higher than the offici… ▽ More This report describes our submission to the Ego4D Moment Queries Challenge 2022. Our submission builds on ActionFormer, the state-of-the-art backbone for temporal action localization, and a trio of strong video features from SlowFast, Omnivore and EgoVLP. Our solution is ranked 2nd on the public leaderboard with 21.76% average mAP on the test set, which is nearly three times higher than the official baseline. Further, we obtain 42.54% Recall@1x at tIoU=0.5 on the test set, outperforming the top-ranked solution by a significant margin of 1.41 absolute percentage points. Our code is available at https://github.com/happyharrycn/actionformer_release. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Comments: 2nd place in ECCV 2022 Ego4D Moment Queries Challenge

arXiv:2211.08704 [pdf, other]

A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge

Authors: Sicheng Mo, Fangzhou Mu, Yin Li

Abstract: This report describes Badgers@UW-Madison, our submission to the Ego4D Natural Language Queries (NLQ) Challenge. Our solution inherits the point-based event representation from our prior work on temporal action localization, and develops a Transformer-based model for video grounding. Further, our solution integrates several strong video features including SlowFast, Omnivore and EgoVLP. Without bell… ▽ More This report describes Badgers@UW-Madison, our submission to the Ego4D Natural Language Queries (NLQ) Challenge. Our solution inherits the point-based event representation from our prior work on temporal action localization, and develops a Transformer-based model for video grounding. Further, our solution integrates several strong video features including SlowFast, Omnivore and EgoVLP. Without bells and whistles, our submission based on a single model achieves 12.64% Mean R@1 and is ranked 2nd on the public leaderboard. Meanwhile, our method garners 28.45% (18.03%) R@5 at tIoU=0.3 (0.5), surpassing the top-ranked solution by up to 5.5 absolute percentage points. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Comments: 5 pages, 2 figures

arXiv:2211.08114 [pdf]

Metal to Mott Insulator Transition in Two-dimensional 1T-TaSe$_2$

Authors: Ning Tian, Zhe Huang, Bo Gyu Jang, Shuaifei Guo, Ya-Jun Yan, **g**g Gao, Yijun Yu, **woong Hwang, Meixiao Wang, Xuan Luo, Yu ** Sun, Zhongkai Liu, Dong-Lai Feng, Xianhui Chen, Sung-Kwan Mo, Minjae Kim, Young-Woo Son, Dawei Shen, Wei Ruan, Yuanbo Zhang

Abstract: When electron-electron interaction dominates over other electronic energy scales, exotic, collective phenomena often emerge out of seemingly ordinary matter. The strongly correlated phenomena, such as quantum spin liquid and unconventional superconductivity, represent a major research frontier and a constant source of inspiration. Central to strongly correlated physics is the concept of Mott insul… ▽ More When electron-electron interaction dominates over other electronic energy scales, exotic, collective phenomena often emerge out of seemingly ordinary matter. The strongly correlated phenomena, such as quantum spin liquid and unconventional superconductivity, represent a major research frontier and a constant source of inspiration. Central to strongly correlated physics is the concept of Mott insulator, from which various other correlated phases derive. The advent of two-dimensional (2D) materials brings unprecedented opportunities to the study of strongly correlated physics in the 2D limit. In particular, the enhanced correlation and extreme tunability of 2D materials enables exploring strongly correlated systems across uncharted parameter space. Here, we discover an intriguing metal to Mott insulator transition in 1T-TaSe$_2$ as the material is thinned down to atomic thicknesses. Specifically, we discover, for the first time, that the bulk metallicity of 1T-TaSe$_2$ arises from a band crossing Fermi level. Reducing the dimensionality effectively quenches the kinetic energy of the initially itinerant electrons and drives the material into a Mott insulating state. The dimensionality-driven Metal to Mott insulator transition resolves the long-standing dichotomy between metallic bulk and insulating surface of 1T-TaSe$_2$. Our results additionally establish 1T-TaSe$_2$ as an ideal variable system for exploring various strongly correlated phenomena. △ Less

Submitted 15 November, 2022; originally announced November 2022.

arXiv:2210.10194 [pdf, other]

Rethinking Prototypical Contrastive Learning through Alignment, Uniformity and Correlation

Authors: Shentong Mo, Zhun Sun, Chao Li

Abstract: Contrastive self-supervised learning (CSL) with a prototypical regularization has been introduced in learning meaningful representations for downstream tasks that require strong semantic information. However, to optimize CSL with a loss that performs the prototypical regularization aggressively, e.g., the ProtoNCE loss, might cause the "coagulation" of examples in the embedding space. That is, the… ▽ More Contrastive self-supervised learning (CSL) with a prototypical regularization has been introduced in learning meaningful representations for downstream tasks that require strong semantic information. However, to optimize CSL with a loss that performs the prototypical regularization aggressively, e.g., the ProtoNCE loss, might cause the "coagulation" of examples in the embedding space. That is, the intra-prototype diversity of samples collapses to trivial solutions for their prototype being well-separated from others. Motivated by previous works, we propose to mitigate this phenomenon by learning Prototypical representation through Alignment, Uniformity and Correlation (PAUC). Specifically, the ordinary ProtoNCE loss is revised with: (1) an alignment loss that pulls embeddings from positive prototypes together; (2) a uniformity loss that distributes the prototypical level features uniformly; (3) a correlation loss that increases the diversity and discriminability between prototypical level features. We conduct extensive experiments on various benchmarks where the results demonstrate the effectiveness of our method in improving the quality of prototypical contrastive representations. Particularly, in the classification down-stream tasks with linear probes, our proposed method outperforms the state-of-the-art instance-wise and prototypical contrastive learning methods on the ImageNet-100 dataset by 2.96% and the ImageNet-1K dataset by 2.46% under the same settings of batch size and epochs. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: BMVC 2022

arXiv:2210.02010 [pdf]

doi 10.1002/adma.202204579

A novel $\sqrt{19}\times\sqrt{19}$ superstructure in epitaxially grown 1T-TaTe$_2$

Authors: **woong Hwang, Yeongrok **, Canxun Zhang, Tiancong Zhu, Kyoo Kim, Yong Zhong, Ji-Eun Lee, Zongqi Shen, Yi Chen, Wei Ruan, Hye** Ryu, Choongyu Hwang, Jaekwang Lee, Michael F. Crommie, Sung-Kwan Mo, Zhi-Xun Shen

Abstract: The spontaneous formation of electronic orders is a crucial element for understanding complex quantum states and engineering heterostructures in two-dimensional materials. We report a novel $\sqrt{19}\times\sqrt{19}$ charge order in few-layer thick 1T-TaTe$_2$ transition metal dichalcogenide films grown by molecular beam epitaxy, which has not been realized. Our photoemission and scanning probe me… ▽ More The spontaneous formation of electronic orders is a crucial element for understanding complex quantum states and engineering heterostructures in two-dimensional materials. We report a novel $\sqrt{19}\times\sqrt{19}$ charge order in few-layer thick 1T-TaTe$_2$ transition metal dichalcogenide films grown by molecular beam epitaxy, which has not been realized. Our photoemission and scanning probe measurements demonstrate that monolayer 1T-TaTe$_2$ exhibits a variety of metastable charge density wave orders, including the $\sqrt{19}\times\sqrt{19}$ superstructure, which can be selectively stabilized by controlling the post-growth annealing temperature. Moreover, we find that only the $\sqrt{19}\times\sqrt{19}$ order persists in 1T-TaTe$_2$ films thicker than a monolayer, up to 8 layers. Our findings identify the previously unrealized novel electronic order in a much-studied transition metal dichalcogenide and provide a viable route to control it within the epitaxial growth process. △ Less

Submitted 4 October, 2022; originally announced October 2022.

Journal ref: Advanced materials 34, 2204579 (2022)

arXiv:2210.00314 [pdf, other]

Learning Hierarchical Image Segmentation For Recognition and By Recognition

Authors: Tsung-Wei Ke, Sangwoo Mo, Stella X. Yu

Abstract: Large vision and language models learned directly through image-text associations often lack detailed visual substantiation, whereas image segmentation tasks are treated separately from recognition, supervisedly learned without interconnections. Our key observation is that, while an image can be recognized in multiple ways, each has a consistent part-and-whole visual organization. Segmentation thu… ▽ More Large vision and language models learned directly through image-text associations often lack detailed visual substantiation, whereas image segmentation tasks are treated separately from recognition, supervisedly learned without interconnections. Our key observation is that, while an image can be recognized in multiple ways, each has a consistent part-and-whole visual organization. Segmentation thus should be treated not as an end task to be mastered through supervised learning, but as an internal process that evolves with and supports the ultimate goal of recognition. We propose to integrate a hierarchical segmenter into the recognition process, train and adapt the entire model solely on image-level recognition objectives. We learn hierarchical segmentation for free alongside recognition, automatically uncovering part-to-whole relationships that not only underpin but also enhance recognition. Enhancing the Vision Transformer (ViT) with adaptive segment tokens and graph pooling, our model surpasses ViT in unsupervised part-whole discovery, semantic segmentation, image classification, and efficiency. Notably, our model (trained on unlabeled 1M ImageNet images) outperforms SAM (trained on 11M images and 1 billion masks) by absolute 8% in mIoU on PartImageNet object segmentation. △ Less

Submitted 2 May, 2024; v1 submitted 1 October, 2022; originally announced October 2022.

Comments: ICLR 2024 (spotlight). First two authors contributed equally. Code available at https://github.com/twke18/CAST

ACM Class: I.4.6; I.4.10; I.5.3

arXiv:2209.09634 [pdf, other]

A Closer Look at Weakly-Supervised Audio-Visual Source Localization

Authors: Shentong Mo, Pedro Morgado

Abstract: Audio-visual source localization is a challenging task that aims to predict the location of visual sound sources in a video. Since collecting ground-truth annotations of sounding objects can be costly, a plethora of weakly-supervised localization methods that can learn from datasets with no bounding-box annotations have been proposed in recent years, by leveraging the natural co-occurrence of audi… ▽ More Audio-visual source localization is a challenging task that aims to predict the location of visual sound sources in a video. Since collecting ground-truth annotations of sounding objects can be costly, a plethora of weakly-supervised localization methods that can learn from datasets with no bounding-box annotations have been proposed in recent years, by leveraging the natural co-occurrence of audio and visual signals. Despite significant interest, popular evaluation protocols have two major flaws. First, they allow for the use of a fully annotated dataset to perform early stop**, thus significantly increasing the annotation effort required for training. Second, current evaluation metrics assume the presence of sound sources at all times. This is of course an unrealistic assumption, and thus better metrics are necessary to capture the model's performance on (negative) samples with no visible sound sources. To accomplish this, we extend the test set of popular benchmarks, Flickr SoundNet and VGG-Sound Sources, in order to include negative samples, and measure performance using metrics that balance localization accuracy and recall. Using the new protocol, we conducted an extensive evaluation of prior methods, and found that most prior works are not capable of identifying negatives and suffer from significant overfitting problems (rely heavily on early stop** for best results). We also propose a new approach for visual sound source localization that addresses both these problems. In particular, we found that, through extreme visual dropout and the use of momentum encoders, the proposed approach combats overfitting effectively, and establishes a new state-of-the-art performance on both Flickr SoundNet and VGG-Sound Source. Code and pre-trained models are available at https://github.com/stoneMo/SLAVC. △ Less

Submitted 30 August, 2022; originally announced September 2022.

arXiv:2208.08819 [pdf, other]

Siamese Prototypical Contrastive Learning

Authors: Shentong Mo, Zhun Sun, Chao Li

Abstract: Contrastive Self-supervised Learning (CSL) is a practical solution that learns meaningful visual representations from massive data in an unsupervised approach. The ordinary CSL embeds the features extracted from neural networks onto specific topological structures. During the training progress, the contrastive loss draws the different views of the same input together while pushing the embeddings f… ▽ More Contrastive Self-supervised Learning (CSL) is a practical solution that learns meaningful visual representations from massive data in an unsupervised approach. The ordinary CSL embeds the features extracted from neural networks onto specific topological structures. During the training progress, the contrastive loss draws the different views of the same input together while pushing the embeddings from different inputs apart. One of the drawbacks of CSL is that the loss term requires a large number of negative samples to provide better mutual information bound ideally. However, increasing the number of negative samples by larger running batch size also enhances the effects of false negatives: semantically similar samples are pushed apart from the anchor, hence downgrading downstream performance. In this paper, we tackle this problem by introducing a simple but effective contrastive learning framework. The key insight is to employ siamese-style metric loss to match intra-prototype features, while increasing the distance between inter-prototype features. We conduct extensive experiments on various benchmarks where the results demonstrate the effectiveness of our method on improving the quality of visual representations. Specifically, our unsupervised pre-trained ResNet-50 with a linear probe, out-performs the fully-supervised trained version on the ImageNet-1K dataset. △ Less

Submitted 18 August, 2022; originally announced August 2022.

Comments: BMVC 2021

arXiv:2207.08186 [pdf]

doi 10.1103/PhysRevB.106.035129

Persistent exchange splitting in a chiral helimagnet Cr1/3NbS2

Authors: Na Qin, Cheng Chen, Shiqiao Du, Xian Du, Xin Zhang, Zhongxu Yin, **gsong Zhou, Runzhe Xu, Xu Gu, Qinqin Zhang, Wenxuan Zhao, Yidian Li, Sung-Kwan Mo, Zhongkai Liu, Shilei Zhang, Yanfeng Guo, P. Z. Tang, Yulin Chen, Lexian Yang

Abstract: Using high-resolution angle-resolved photoemission spectroscopy (ARPES) and ab-initio calculation, we systematically investigate the electronic structure of the chiral helimagnet Cr1/3NbS2 and its temperature evolution. The comparison with NbS2 suggests that the electronic structure of Cr1/3NbS2 is strongly modified by the intercalation of Cr atoms. Our ab-initio calculation, consistent with exper… ▽ More Using high-resolution angle-resolved photoemission spectroscopy (ARPES) and ab-initio calculation, we systematically investigate the electronic structure of the chiral helimagnet Cr1/3NbS2 and its temperature evolution. The comparison with NbS2 suggests that the electronic structure of Cr1/3NbS2 is strongly modified by the intercalation of Cr atoms. Our ab-initio calculation, consistent with experimental result, suggests strong hybridization between Nb- and Cr-derived states near the Fermi level. In the chiral helimagnetic state (below the Curie temperature Tc), we observe exchange splitting of the energy bands crossing EF, which follows the temperature evolution of the magnetic moment, suggesting an important role of the conduction electrons in the long-range magnetic ordering. Interestingly, the exchange splitting persists far above Tc with negligible temperature dependence, in drastic contrast to the itinerant ferromagnetism described by the Stoner model, indicating the existence of short-range magnetic order. Our results provide important insights into the microscopic mechanism of the chiral helimagnetic ordering in Cr1/3NbS2. △ Less

Submitted 17 July, 2022; originally announced July 2022.

arXiv:2205.14339 [pdf, other]

Spectral Evidence for Unidirectional Charge Density Wave in Detwinned BaNi$_2$As$_2$

Authors: Yucheng Guo, Mason Klemm, Ji Seop Oh, Yaofeng Xie, Bing-Hua Lei, Sergey Gorovikov, Tor Pedersen, Matteo Michiardi, Sergey Zhdanovich, Andrea Damascelli, Jonathan Denlinger, Makoto Hashimoto, Donghui Lu, Sung-Kwan Mo, Rob G. Moore, Robert J. Birgeneau, David J. Singh, Pengcheng Dai, Ming Yi

Abstract: The emergence of unconventional superconductivity in proximity to intertwined electronic orders is especially relevant in the case of iron-based superconductors. Such order consists of an electronic nematic order and a spin density wave in these systems. BaNi$_2$As$_2$, like its well-known iron-based analog BaFe$_2$As$_2$, also hosts a symmetry-breaking structural transition that is coupled to a u… ▽ More The emergence of unconventional superconductivity in proximity to intertwined electronic orders is especially relevant in the case of iron-based superconductors. Such order consists of an electronic nematic order and a spin density wave in these systems. BaNi$_2$As$_2$, like its well-known iron-based analog BaFe$_2$As$_2$, also hosts a symmetry-breaking structural transition that is coupled to a unidirectional charge density wave (CDW), providing a novel platform to study intertwined orders. Here, through a systematic angle-resolved photoemission spectroscopy study combined with a detwinning $B_1g$ uniaxial strain, we identify distinct spectral evidence of band evolution due to the structural transition as well as CDW-induced band folding. In contrast to the nematicity and spin density wave in BaFe$_2$As$_2$, the structural and CDW order parameters in BaNi$_2$As$_2$ are observed to be strongly coupled and do not separate in the presence of uniaxial strain. Our measurements point to a likely lattice origin of the CDW in BaNi$_2$As$_2$. △ Less

Submitted 28 May, 2022; originally announced May 2022.

Comments: 6 pages, 4 figures

arXiv:2205.14338 [pdf, other]

Object-wise Masked Autoencoders for Fast Pre-training

Authors: Jiantao Wu, Shentong Mo

Abstract: Self-supervised pre-training for images without labels has recently achieved promising performance in image classification. The success of transformer-based methods, ViT and MAE, draws the community's attention to the design of backbone architecture and self-supervised task. In this work, we show that current masked image encoding models learn the underlying relationship between all objects in the… ▽ More Self-supervised pre-training for images without labels has recently achieved promising performance in image classification. The success of transformer-based methods, ViT and MAE, draws the community's attention to the design of backbone architecture and self-supervised task. In this work, we show that current masked image encoding models learn the underlying relationship between all objects in the whole scene, instead of a single object representation. Therefore, those methods bring a lot of compute time for self-supervised pre-training. To solve this issue, we introduce a novel object selection and division strategy to drop non-object patches for learning object-wise representations by selective reconstruction with interested region masks. We refer to this method ObjMAE. Extensive experiments on four commonly-used datasets demonstrate the effectiveness of our model in reducing the compute cost by 72% while achieving competitive performance. Furthermore, we investigate the inter-object and intra-object relationship and find that the latter is crucial for self-supervised pre-training. △ Less

Submitted 28 May, 2022; originally announced May 2022.

arXiv:2205.01679 [pdf, other]

Physics to the Rescue: Deep Non-line-of-sight Reconstruction for High-speed Imaging

Authors: Fangzhou Mu, Sicheng Mo, Jiayong Peng, Xiaochun Liu, Ji Hyun Nam, Siddeshwar Raghavan, Andreas Velten, Yin Li

Abstract: Computational approach to imaging around the corner, or non-line-of-sight (NLOS) imaging, is becoming a reality thanks to major advances in imaging hardware and reconstruction algorithms. A recent development towards practical NLOS imaging, Nam et al. demonstrated a high-speed non-confocal imaging system that operates at 5Hz, 100x faster than the prior art. This enormous gain in acquisition rate,… ▽ More Computational approach to imaging around the corner, or non-line-of-sight (NLOS) imaging, is becoming a reality thanks to major advances in imaging hardware and reconstruction algorithms. A recent development towards practical NLOS imaging, Nam et al. demonstrated a high-speed non-confocal imaging system that operates at 5Hz, 100x faster than the prior art. This enormous gain in acquisition rate, however, necessitates numerous approximations in light transport, breaking many existing NLOS reconstruction methods that assume an idealized image formation model. To bridge the gap, we present a novel deep model that incorporates the complementary physics priors of wave propagation and volume rendering into a neural network for high-quality and robust NLOS reconstruction. This orchestrated design regularizes the solution space by relaxing the image formation model, resulting in a deep model that generalizes well on real captures despite being exclusively trained on synthetic data. Further, we devise a unified learning framework that enables our model to be flexibly trained using diverse supervision signals, including target intensity images or even raw NLOS transient measurements. Once trained, our model renders both intensity and depth images at inference time in a single forward pass, capable of processing more than 5 captures per second on a high-end GPU. Through extensive qualitative and quantitative experiments, we show that our method outperforms prior physics and learning based approaches on both synthetic and real measurements. We anticipate that our method along with the fast capturing system will accelerate future development of NLOS imaging for real world applications that require high-speed imaging. △ Less

Submitted 5 August, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

Comments: ICCP 2022 (TPAMI Special Issue on Computational Photography). Project page: https://pages.cs.wisc.edu/~fmu/nlos3d/

arXiv:2204.11263 [pdf]

Observation of Dimension-Crossover of a Tunable 1D Dirac Fermion in Topological Semimetal NbSi$_x$Te$_2$

Authors: **g Zhang, Yangyang Lv, Xiaolong Feng, Aiji Liang, Wei Xia, Sung-Kwan Mo, Cheng Chen, Jiamin Xue, Shengyuan A. Yang, Lexian Yang, Yanfeng Guo, Yanbin Chen, Yulin Chen, Zhongkai Liu

Abstract: Condensed matter systems in low dimensions exhibit emergent physics that does not exist in three dimensions. When electrons are confined to one dimension (1D), some significant electronic states appear, such as charge density wave, spin-charge separations and Su-Schrieffer-Heeger (SSH) topological state. However, a clear understanding of how the 1D electronic properties connects with topology is c… ▽ More Condensed matter systems in low dimensions exhibit emergent physics that does not exist in three dimensions. When electrons are confined to one dimension (1D), some significant electronic states appear, such as charge density wave, spin-charge separations and Su-Schrieffer-Heeger (SSH) topological state. However, a clear understanding of how the 1D electronic properties connects with topology is currently lacking. Here we systematically investigated the characteristic 1D Dirac fermion electronic structure originated from the metallic NbTe$_2$ chains on the surface of the composition-tunable layered compound NbSi$_x$Te$_2$ ($x$ = 0.40 and 0.43) using angle-resolved photoemission spectroscopy. We found the Dirac fermion forms a Dirac nodal line structure protected by the combined $\widetilde{\mathcal{M}}{\rm_y}$ and time-reversal symmetry T and proves the NbSi$_x$Te$_2$ system as a topological semimetal, in consistent with the ab-initio calculations. As $x$ decreases, the interaction between adjacent NbTe2 chains increases and Dirac fermion goes through a dimension-crossover from 1D to 2D, as evidenced by the variation of its Fermi surface and Fermi velocity across the Brillouin zone in consistence with a Dirac SSH model. Our findings demonstrate a tunable 1D Dirac electron system, which offers a versatile platform for the exploration of intriguing 1D physics and device applications. △ Less

Submitted 24 April, 2022; originally announced April 2022.

Comments: 24 pages, 4 figures, to be published in npj Quantum Materials

arXiv:2204.11259 [pdf]

doi 10.1021/acsnano.1c03666

Direct Visualization and Manipulation of Tunable Quantum Well State in Semiconducting Nb2SiTe4

Authors: **g Zhang, Zhilong Yang, Shuai Liu, Wei Xia, Tongshuai Zhu, Cheng Chen, Chengwei Wang, Meixiao Wang, Sung-Kwan Mo, Lexian Yang, Xufeng Kou, Yanfeng Guo, Haijun Zhang, Zhongkai Liu, Yulin Chen

Abstract: Quantum well states (QWSs) can form at the surface or interfaces of materials with confinement potential. They have broad applications in electronic and optical devices such as high mobility electron transistor, photodetector and quantum well laser. The properties of the QWSs are usually the key factors for the performance of the devices. However, direct visualization and manipulation of such stat… ▽ More Quantum well states (QWSs) can form at the surface or interfaces of materials with confinement potential. They have broad applications in electronic and optical devices such as high mobility electron transistor, photodetector and quantum well laser. The properties of the QWSs are usually the key factors for the performance of the devices. However, direct visualization and manipulation of such states are in general challenging. In this work, by using angle-resolved photoemission spectroscopy (ARPES) and scanning tunneling microscopy/spectroscopy (STM/STS), we directly probe the QWSs generated on the vacuum interface of a narrow band gap semiconductor Nb2SiTe4. Interestingly, the position and splitting of QWSs could be easily manipulated via potassium (K) dosage onto the sample surface. Our results suggest Nb2SiTe4 to be an intriguing semiconductor system to study and engineer the QWSs, which has great potential in device applications. △ Less

Submitted 24 April, 2022; originally announced April 2022.

Comments: 28 pages, 5 figures,

Journal ref: ACS Nano 2021 15 (10), 15850-15857

arXiv:2204.11204 [pdf, ps, other]

doi 10.3389/fphy.2022.886459

Nematic fluctuations in the non-superconducting iron pnictide BaFe$_{1.9-x}$Ni$_{0.1}$Cr$_{x}$As$_{2}$

Authors: Dongliang Gong, Ming Yi, Meng Wang, Tao Xie, Wenliang Zhang, Sergey Danilkin, Guochu Deng, Xinzhi Liu, Jitae T. Park, Kazuhiko Ikeuchi, Kazuya Kamazawa, Sung-Kwan Mo, Makoto Hashimoto, Donghui Lu, Rui Zhang, Pengcheng Dai, Robert J. Birgeneau, Shiliang Li, Huiqian Luo

Abstract: The main driven force of the electronic nematic phase in iron-based superconductors is still under debate. Here, we report a comprehensive study on the nematic fluctuations in a non-superconducting iron pnictide system BaFe$_{1.9-x}$Ni$_{0.1}$Cr$_{x}$As$_{2}$ by electronic transport, angle-resolved photoemission spectroscopy (ARPES) and inelastic neutron scattering (INS) measurements. Previous neu… ▽ More The main driven force of the electronic nematic phase in iron-based superconductors is still under debate. Here, we report a comprehensive study on the nematic fluctuations in a non-superconducting iron pnictide system BaFe$_{1.9-x}$Ni$_{0.1}$Cr$_{x}$As$_{2}$ by electronic transport, angle-resolved photoemission spectroscopy (ARPES) and inelastic neutron scattering (INS) measurements. Previous neutron diffraction and transport measurements suggested that the collinear antiferromagnetism persists to $x=0.8$, with similar Néel temperature $T_N$ and structural transition temperature $T_s$ around 32 K, but the charge carriers change from electron type to hole type around $x=$ 0.5. In this study, we have found that the in-plane resistivity anisotropy also highly depends on the Cr do**s and the type of charge carriers. While ARPES measurements suggest possibly weak orbital anisotropy onset near $T_s$ for both $x=0.05$ and $x=0.5$ compounds, INS experiments reveal clearly different onset temperatures of low-energy spin excitation anisotropy, which is likely related to the energy scale of spin nematicity. These results suggest that the interplay between the local spins on Fe atoms and the itinerant electrons on Fermi surfaces is crucial to the nematic fluctuations of iron pnictides, where the orbital degree of freedom may behave differently from the spin degree of freedom, and the transport properties are intimately related to the spin dynamics. △ Less

Submitted 24 April, 2022; originally announced April 2022.

Comments: 12 pages, 8 figures. Frontiers in Physics: Topic "Nematicity in Iron-Based Superconductors"

Journal ref: Front. Phys. 10, 886459 (2022)

arXiv:2204.05627 [pdf, other]

Proximal Policy Optimization Learning based Control of Congested Freeway Traffic

Authors: Shurong Mo, Nailong Wu, Jie Qi, Anqi Pan, Zhiguang Feng, Huaicheng Yan, Yueying Wang

Abstract: This study proposes a delay-compensated feedback controller based on proximal policy optimization (PPO) reinforcement learning to stabilize traffic flow in the congested regime by manipulating the time-gap of adaptive cruise control-equipped (ACC-equipped) vehicles.The traffic dynamics on a freeway segment are governed by an Aw-Rascle-Zhang (ARZ) model, consisting of $2\times 2$ nonlinear first-or… ▽ More This study proposes a delay-compensated feedback controller based on proximal policy optimization (PPO) reinforcement learning to stabilize traffic flow in the congested regime by manipulating the time-gap of adaptive cruise control-equipped (ACC-equipped) vehicles.The traffic dynamics on a freeway segment are governed by an Aw-Rascle-Zhang (ARZ) model, consisting of $2\times 2$ nonlinear first-order partial differential equations (PDEs).Inspired by the backstep** delay compensator [18] but different from whose complex segmented control scheme, the PPO control is composed of three feedbacks, namely the current traffic flow velocity, the current traffic flow density and previous one step control input. The control gains for the three feedbacks are learned from the interaction between the PPO and the numerical simulator of the traffic system without knowing the system dynamics. Numerical simulation experiments are designed to compare the Lyapunov control, the backstep** control and the PPO control. The results show that for a delay-free system, the PPO control has faster convergence rate and less control effort than the Lyapunov control. For a traffic system with input delay, the performance of the PPO controller is comparable to that of the Backstep** controller, even for the situation that the delay value does not match. However, the PPO is robust to parameter perturbations, while the Backstep** controller cannot stabilize a system where one of the parameters is disturbed by Gaussian noise. △ Less

Submitted 14 January, 2023; v1 submitted 12 April, 2022; originally announced April 2022.

arXiv:2204.00298 [pdf, other]

Unitail: Detecting, Reading, and Matching in Retail Scene

Authors: Fangyi Chen, Han Zhang, Zaiwang Li, Jiachen Dou, Shentong Mo, Hao Chen, Yongxin Zhang, Uzair Ahmed, Chenchen Zhu, Marios Savvides

Abstract: To make full use of computer vision technology in stores, it is required to consider the actual needs that fit the characteristics of the retail scene. Pursuing this goal, we introduce the United Retail Datasets (Unitail), a large-scale benchmark of basic visual tasks on products that challenges algorithms for detecting, reading, and matching. With 1.8M quadrilateral-shaped instances annotated, th… ▽ More To make full use of computer vision technology in stores, it is required to consider the actual needs that fit the characteristics of the retail scene. Pursuing this goal, we introduce the United Retail Datasets (Unitail), a large-scale benchmark of basic visual tasks on products that challenges algorithms for detecting, reading, and matching. With 1.8M quadrilateral-shaped instances annotated, the Unitail offers a detection dataset to align product appearance better. Furthermore, it provides a gallery-style OCR dataset containing 1454 product categories, 30k text regions, and 21k transcriptions to enable robust reading on products and motivate enhanced product matching. Besides benchmarking the datasets using various state-of-the-arts, we customize a new detector for product detection and provide a simple OCR-based matching solution that verifies its effectiveness. △ Less

Submitted 20 July, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

Comments: ECCV 2022

arXiv:2203.16769 [pdf]

doi 10.1038/s41467-022-28542-y

Large-gap insulating dimer ground state in monolayer IrTe2

Authors: **woong Hwang, Kyoo Kim, Canxun Zhang, Tiancong Zhu, Charlotte Herbig, Sooran Kim, Bongjae Kim, Yong Zhong, Mohamed Salah, Mohamed M. El-Desoky, Choongyu Hwang, Zhi-Xun Shen, Michael F. Crommie, Sung-Kwan Mo

Abstract: Monolayers of two-dimensional van der Waals materials exhibit novel electronic phases distinct from their bulk due to the symmetry breaking and reduced screening in the absence of the interlayer coupling. In this work, we combine angle-resolved photoemission spectroscopy and scanning tunneling microscopy/spectroscopy to demonstrate the emergence of a unique insulating 2 x 1 dimer ground state in m… ▽ More Monolayers of two-dimensional van der Waals materials exhibit novel electronic phases distinct from their bulk due to the symmetry breaking and reduced screening in the absence of the interlayer coupling. In this work, we combine angle-resolved photoemission spectroscopy and scanning tunneling microscopy/spectroscopy to demonstrate the emergence of a unique insulating 2 x 1 dimer ground state in monolayer 1T-IrTe2 that has a large band gap in contrast to the metallic bilayer-to-bulk forms of this material. First-principles calculations reveal that phonon and charge instabilities as well as local bond formation collectively enhance and stabilize a charge-ordered ground state. Our findings provide important insights into the subtle balance of interactions having similar energy scales that occurs in the absence of strong interlayer coupling, which offers new opportunities to engineer the properties of 2D monolayers. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Journal ref: Nature communications 13, 906 (2022)

arXiv:2203.10584 [pdf, other]

Point3D: tracking actions as moving points with 3D CNNs

Authors: Shentong Mo, **gfei Xia, Xiaoqing Tan, Bhiksha Raj

Abstract: Spatio-temporal action recognition has been a challenging task that involves detecting where and when actions occur. Current state-of-the-art action detectors are mostly anchor-based, requiring sensitive anchor designs and huge computations due to calculating large numbers of anchor boxes. Motivated by nascent anchor-free approaches, we propose Point3D, a flexible and computationally efficient net… ▽ More Spatio-temporal action recognition has been a challenging task that involves detecting where and when actions occur. Current state-of-the-art action detectors are mostly anchor-based, requiring sensitive anchor designs and huge computations due to calculating large numbers of anchor boxes. Motivated by nascent anchor-free approaches, we propose Point3D, a flexible and computationally efficient network with high precision for spatio-temporal action recognition. Our Point3D consists of a Point Head for action localization and a 3D Head for action classification. Firstly, Point Head is used to track center points and knot key points of humans to localize the bounding box of an action. These location features are then piped into a time-wise attention to learn long-range dependencies across frames. The 3D Head is later deployed for the final action classification. Our Point3D achieves state-of-the-art performance on the JHMDB, UCF101-24, and AVA benchmarks in terms of frame-mAP and video-mAP. Comprehensive ablation studies also demonstrate the effectiveness of each module proposed in our Point3D. △ Less

Submitted 20 March, 2022; originally announced March 2022.

Comments: Accepted by the 32nd British Machine Vision Conference (BMVC 2021)

arXiv:2203.09705 [pdf]

doi 10.1103/PhysRevLett.129.176402

Tailoring Dirac fermions by in-situ tunable high-order moire pattern in graphene-monolayer xenon heterostructure

Authors: Chunlong Wu, Qiang Wan, Cao Peng, Shangkun Mo, Renzhe Li, Keming Zhao, Yan** Guo, Shengjun Yuan, Fengcheng Wu, Chendong Zhang, Nan Xu

Abstract: A variety of novel quantum phases have been achieved in twist bilayer graphene (tBLG) and other moire superlattices recently, including correlated insulators, superconductivity, magnetism, and topological states. These phenomena are very sensitive to the moire superlattices, which can hardly be changed rapidly or intensely. Here, we report the experimental realization of a high-order moire pattern… ▽ More A variety of novel quantum phases have been achieved in twist bilayer graphene (tBLG) and other moire superlattices recently, including correlated insulators, superconductivity, magnetism, and topological states. These phenomena are very sensitive to the moire superlattices, which can hardly be changed rapidly or intensely. Here, we report the experimental realization of a high-order moire pattern (a high-order interference pattern) in graphene-monolayer xenon heterostructure (G/mXe), with moire period in-situ tuned from few nanometers to infinity by changing the lattice constant of Xe through different annealing temperatures and pressures. We use angle-resolved photoemission spectroscopy to directly observe that replicas of graphene Dirac cone emerge and move close to each other in momentum-space as moire pattern continuously expands in real-space. When the moire period approaches infinity, the replicas finally overlap with each other and an energy gap is observed at the Dirac point induced by intervalley coupling, which is a manifestation of Kekule distortion. We construct a continuum moire Hamiltonian, which can explain the experimental results well. The form of moire Hamiltonian in G/mXe is similar to that in tBLG, and moire band with narrow bandwidth is predicted in G/mXe. However, the moire Hamiltonian couples Dirac fermions from different valleys in G/mXe, instead of ones from different layers in tBLG. Our work demonstrates a novel platform to study the continuous evolution of moire pattern and its modulation effect on electronic structure, and provides an unprecedented approach for tailoring Dirac fermions with tunable intervalley coupling. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Comments: 17 pages, 4 figures, supplementary materials available from the authors, submitted Feb. 2022

Journal ref: Phys. Rev. Lett. 129, 176402 (2022)

arXiv:2203.09324 [pdf, other]

Localizing Visual Sounds the Easy Way

Authors: Shentong Mo, Pedro Morgado

Abstract: Unsupervised audio-visual source localization aims at localizing visible sound sources in a video without relying on ground-truth localization for training. Previous works often seek high audio-visual similarities for likely positive (sounding) regions and low similarities for likely negative regions. However, accurately distinguishing between sounding and non-sounding regions is challenging witho… ▽ More Unsupervised audio-visual source localization aims at localizing visible sound sources in a video without relying on ground-truth localization for training. Previous works often seek high audio-visual similarities for likely positive (sounding) regions and low similarities for likely negative regions. However, accurately distinguishing between sounding and non-sounding regions is challenging without manual annotations. In this work, we propose a simple yet effective approach for Easy Visual Sound Localization, namely EZ-VSL, without relying on the construction of positive and/or negative regions during training. Instead, we align audio and visual spaces by seeking audio-visual representations that are aligned in, at least, one location of the associated image, while not matching other images, at any location. We also introduce a novel object guided localization scheme at inference time for improved precision. Our simple and effective framework achieves state-of-the-art performance on two popular benchmarks, Flickr SoundNet and VGG-Sound Source. In particular, we improve the CIoU of the Flickr SoundNet test set from 76.80% to 83.94%, and on the VGG-Sound Source dataset from 34.60% to 38.85%. The code is available at https://github.com/stoneMo/EZ-VSL. △ Less

Submitted 29 March, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

arXiv:2203.03838 [pdf, other]

Multi-Scale Self-Contrastive Learning with Hard Negative Mining for Weakly-Supervised Query-based Video Grounding

Authors: Shentong Mo, Daizong Liu, Wei Hu

Abstract: Query-based video grounding is an important yet challenging task in video understanding, which aims to localize the target segment in an untrimmed video according to a sentence query. Most previous works achieve significant progress by addressing this task in a fully-supervised manner with segment-level labels, which require high labeling cost. Although some recent efforts develop weakly-supervise… ▽ More Query-based video grounding is an important yet challenging task in video understanding, which aims to localize the target segment in an untrimmed video according to a sentence query. Most previous works achieve significant progress by addressing this task in a fully-supervised manner with segment-level labels, which require high labeling cost. Although some recent efforts develop weakly-supervised methods that only need the video-level knowledge, they generally match multiple pre-defined segment proposals with query and select the best one, which lacks fine-grained frame-level details for distinguishing frames with high repeatability and similarity within the entire video. To alleviate the above limitations, we propose a self-contrastive learning framework to address the query-based video grounding task under a weakly-supervised setting. Firstly, instead of utilizing redundant segment proposals, we propose a new grounding scheme that learns frame-wise matching scores referring to the query semantic to predict the possible foreground frames by only using the video-level annotations. Secondly, since some predicted frames (i.e., boundary frames) are relatively coarse and exhibit similar appearance to their adjacent frames, we propose a coarse-to-fine contrastive learning paradigm to learn more discriminative frame-wise representations for distinguishing the false positive frames. In particular, we iteratively explore multi-scale hard negative samples that are close to positive samples in the representation space for distinguishing fine-grained frame-wise details, thus enforcing more accurate segment grounding. Extensive experiments on two challenging benchmarks demonstrate the superiority of our proposed method compared with the state-of-the-art methods. △ Less

Submitted 7 March, 2022; originally announced March 2022.

arXiv:2203.01311 [pdf, other]

High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning

Authors: Paul Pu Liang, Yiwei Lyu, Xiang Fan, Jeffrey Tsaw, Yudong Liu, Shentong Mo, Dani Yogatama, Louis-Philippe Morency, Ruslan Salakhutdinov

Abstract: Many real-world problems are inherently multimodal, from spoken language, gestures, and paralinguistics humans use to communicate, to force, proprioception, and visual sensors on robots. While there has been an explosion of interest in multimodal learning, these methods are focused on a small set of modalities primarily in language, vision, and audio. In order to accelerate generalization towards… ▽ More Many real-world problems are inherently multimodal, from spoken language, gestures, and paralinguistics humans use to communicate, to force, proprioception, and visual sensors on robots. While there has been an explosion of interest in multimodal learning, these methods are focused on a small set of modalities primarily in language, vision, and audio. In order to accelerate generalization towards diverse and understudied modalities, this paper studies efficient representation learning for high-modality scenarios involving a large set of diverse modalities. Since adding new models for every new modality becomes prohibitively expensive, a critical technical challenge is heterogeneity quantification: how can we measure which modalities encode similar information and interactions in order to permit parameter sharing with previous modalities? This paper proposes two new information theoretic metrics for heterogeneity quantification: (1) modality heterogeneity studies how similar 2 modalities {X1,X2} are by measuring how much information can be transferred from X1 to X2, while (2) interaction heterogeneity studies how similarly pairs of modalities {X1,X2}, {X3,X4} interact by measuring how much information can be transferred from fusing {X1,X2} to {X3,X4}. We show the importance of these 2 proposed metrics as a way to automatically prioritize the fusion of modalities that contain unique information or interactions. The result is a single model, HighMMT, that scales up to 10 modalities (text, image, audio, video, sensors, proprioception, speech, time-series, sets, and tables) and 15 tasks from 5 research areas. Not only does HighMMT outperform prior methods on the tradeoff between performance and efficiency, it also demonstrates a crucial scaling behavior: performance continues to improve with each modality added, and it transfers to entirely new modalities and tasks during fine-tuning. △ Less

Submitted 28 June, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

Comments: TMLR 2023, Code available at https://github.com/pliang279/HighMMT

arXiv:2202.10571 [pdf, other]

Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

Authors: Sihyun Yu, Jihoon Tack, Sangwoo Mo, Hyunsu Kim, Junho Kim, Jung-Woo Ha, **woo Shin

Abstract: In the deep learning era, long video generation of high-quality still remains challenging due to the spatio-temporal complexity and continuity of videos. Existing prior works have attempted to model video distribution by representing videos as 3D grids of RGB values, which impedes the scale of generated videos and neglects continuous dynamics. In this paper, we found that the recent emerging parad… ▽ More In the deep learning era, long video generation of high-quality still remains challenging due to the spatio-temporal complexity and continuity of videos. Existing prior works have attempted to model video distribution by representing videos as 3D grids of RGB values, which impedes the scale of generated videos and neglects continuous dynamics. In this paper, we found that the recent emerging paradigm of implicit neural representations (INRs) that encodes a continuous signal into a parameterized neural network effectively mitigates the issue. By utilizing INRs of video, we propose dynamics-aware implicit generative adversarial network (DIGAN), a novel generative adversarial network for video generation. Specifically, we introduce (a) an INR-based video generator that improves the motion dynamics by manipulating the space and time coordinates differently and (b) a motion discriminator that efficiently identifies the unnatural motions without observing the entire long frame sequences. We demonstrate the superiority of DIGAN under various datasets, along with multiple intriguing properties, e.g., long video synthesis, video extrapolation, and non-autoregressive video generation. For example, DIGAN improves the previous state-of-the-art FVD score on UCF-101 by 30.7% and can be trained on 128 frame videos of 128x128 resolution, 80 frames longer than the 48 frames of the previous state-of-the-art method. △ Less

Submitted 21 February, 2022; originally announced February 2022.

Comments: ICLR 2022. Project page with videos and code: https://sihyun-yu.github.io/digan/

arXiv:2202.07224 [pdf]

doi 10.1038/s41567-022-01751-4

Evidence for a spinon Kondo effect in cobalt atoms on single-layer 1T-TaSe$_2$

Authors: Yi Chen, Wen-Yu He, Wei Ruan, **woong Hwang, Shujie Tang, Ryan L. Lee, Meng Wu, Tiancong Zhu, Canxun Zhang, Hye** Ryu, Feng Wang, Steven G. Louie, Zhi-Xun Shen, Sung-Kwan Mo, Patrick A. Lee, Michael F. Crommie

Abstract: Quantum spin liquids (QSLs) are highly entangled, disordered magnetic states that arise in frustrated Mott insulators and host exotic fractional excitations such as spinons and chargons. Despite being charge insulators some QSLs are predicted to exhibit gapless itinerant spinons that yield metallic behavior in the spin channel. We have deposited isolated magnetic atoms onto single-layer (SL) 1T-Ta… ▽ More Quantum spin liquids (QSLs) are highly entangled, disordered magnetic states that arise in frustrated Mott insulators and host exotic fractional excitations such as spinons and chargons. Despite being charge insulators some QSLs are predicted to exhibit gapless itinerant spinons that yield metallic behavior in the spin channel. We have deposited isolated magnetic atoms onto single-layer (SL) 1T-TaSe$_2$, a gapless QSL candidate, to experimentally probe how itinerant spinons couple to impurity spin centers. Using scanning tunneling spectroscopy we observe the emergence of new, impurity-induced resonance peaks at the 1T-TaSe$_2$ Hubbard band edges when cobalt adatoms are positioned to have maximal spatial overlap with the Hubbard band charge distribution. These resonance peaks disappear when the spatial overlap is reduced or when the magnetic impurities are replaced with non-magnetic impurities. Theoretical simulations using a modified Anderson impurity model integrated with a gapless quantum spin liquid show that these resonance peaks are consistent with a Kondo resonance induced by spinons combined with spinon-chargon binding effects that arise due to QSL gauge-field fluctuations. △ Less

Submitted 15 February, 2022; originally announced February 2022.

Journal ref: Nature Physics 18, 1335 (2022)

arXiv:2202.06927 [pdf, other]

doi 10.1038/s41535-022-00441-x

Nonsymmorphic Symmetry-Protected Band Crossings in a Square-Net Metal PtPb$_4$

Authors: Han Wu, Alannah M. Hallas, Xiaochan Cai, Jianwei Huang, Ji Seop Oh, Vaideesh Loganathan, Ashley Weiland, Gregory T. McCandless, Julia Y. Chan, Sung-Kwan Mo, Donghui Lu, Makoto Hashimoto, Jonathan Denlinger, Robert J. Birgeneau, Andriy H. Nevidomskyy, Gang Li, Emilia Morosan, Ming Yi

Abstract: Topological semimetals with symmetry-protected band crossings have emerged as a rich landscape to explore intriguing electronic phenomena. Nonsymmorphic symmetries in particular have been shown to play an important role in protecting the crossings along a line (rather than a point) in momentum space. Here we report experimental and theoretical evidence for Dirac nodal line crossings along the Bril… ▽ More Topological semimetals with symmetry-protected band crossings have emerged as a rich landscape to explore intriguing electronic phenomena. Nonsymmorphic symmetries in particular have been shown to play an important role in protecting the crossings along a line (rather than a point) in momentum space. Here we report experimental and theoretical evidence for Dirac nodal line crossings along the Brillouin zone boundaries in PtPb$_4$, arising from the nonsymmorphic symmetry of its crystal structure. Interestingly, while the nodal lines would remain gapless in the absence of spin-orbit coupling (SOC), the SOC in this case plays a detrimental role to topology by lifting the band degeneracy everywhere except at a set of isolated points. Nevertheless, the nodal line is observed to have a bandwidth much smaller than that found in density functional theory (DFT). Our findings reveal PtPb$_4$ to be a material system with narrow crossings approximately protected by non-symmorhpic crystalline symmetries. △ Less

Submitted 25 March, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

Comments: 21 pages, 4 figures, accepted for publication in npj Quantum Mater

Journal ref: npj Quantum Mater. 7, 31 (2022)

arXiv:2202.03026 [pdf, other]

Context Autoencoder for Self-Supervised Representation Learning

Authors: Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, ** Luo, Gang Zeng, **gdong Wang

Abstract: We present a novel masked image modeling (MIM) approach, context autoencoder (CAE), for self-supervised representation pretraining. We pretrain an encoder by making predictions in the encoded representation space. The pretraining tasks include two tasks: masked representation prediction - predict the representations for the masked patches, and masked patch reconstruction - reconstruct the masked p… ▽ More We present a novel masked image modeling (MIM) approach, context autoencoder (CAE), for self-supervised representation pretraining. We pretrain an encoder by making predictions in the encoded representation space. The pretraining tasks include two tasks: masked representation prediction - predict the representations for the masked patches, and masked patch reconstruction - reconstruct the masked patches. The network is an encoder-regressor-decoder architecture: the encoder takes the visible patches as input; the regressor predicts the representations of the masked patches, which are expected to be aligned with the representations computed from the encoder, using the representations of visible patches and the positions of visible and masked patches; the decoder reconstructs the masked patches from the predicted encoded representations. The CAE design encourages the separation of learning the encoder (representation) from completing the pertaining tasks: masked representation prediction and masked patch reconstruction tasks, and making predictions in the encoded representation space empirically shows the benefit to representation learning. We demonstrate the effectiveness of our CAE through superior transfer performance in downstream tasks: semantic segmentation, object detection and instance segmentation, and classification. The code will be available at https://github.com/Atten4Vis/CAE. △ Less

Submitted 10 August, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: Accepted by International Journal of Computer Vision (IJCV)

arXiv:2201.11592 [pdf]

doi 10.1038/s41467-023-36857-7

Evidences for the exciton gas phase and its condensation in monolayer 1T-ZrTe2

Authors: Yekai Song, Chun**g Jia, Hongyu Xiong, Binbin Wang, Zhicheng Jiang, Kui Huang, **woong Hwang, Zhuojun Li, Choongyu Hwang, Zhongkai Liu, Dawei Shen, Jonathan Sobota, Patrick Kirchmann, Jiamin Xue, Thomas P. Devereaux, Sung-Kwan Mo, Zhi-Xun Shen, Shujie Tang

Abstract: The excitonic insulator (EI) is a Bose-Einstein condensation (BEC) of excitons bound by electron-hole interaction in a solid, which could support high-temperature BEC transition. The material realization of EI has been elusive, which is further challenged by the difficulty of distinguishing it from a conventional charge density wave (CDW) state. In the BEC limit, the pre-condensation exciton gas p… ▽ More The excitonic insulator (EI) is a Bose-Einstein condensation (BEC) of excitons bound by electron-hole interaction in a solid, which could support high-temperature BEC transition. The material realization of EI has been elusive, which is further challenged by the difficulty of distinguishing it from a conventional charge density wave (CDW) state. In the BEC limit, the pre-condensation exciton gas phase is a hallmark to distinguish EI from conventional CDW, yet direct experimental evidence has been lacking. Here we report a distinct correlated phase beyond the $2\times2$ CDW ground state emerging in epitaxially grown monolayer 1T-ZrTe2 and its investigation by angle-resolved photoemission spectroscopy (ARPES) and scanning tunneling microscopy (STM). The results show novel band- and energy-dependent folding behavior in a two-step process, evidenced by an exciton gas phase prior to its condensation into the final CDW state. The excellent agreement between experiments and theoretical predictions on the recovery of the pristine band structure by carrier-density-dependent suppression of the CDW state further corroborates the monolayer 1T-ZrTe2 as an EI. Our findings provide a versatile two-dimensional platform that allows tuning of the excitonic effect. △ Less

Submitted 30 March, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: 22 pages, 4 figures

arXiv:2201.02667 [pdf, other]

doi 10.1038/s42005-022-00805-6

Correlation-Driven Electronic Reconstruction in FeTe$_{1-x}$Se$_x$

Authors: Jianwei Huang, Rong Yu, Zhijun Xu, Jian-Xin Zhu, Ji Seop Oh, Qianni Jiang, Meng Wang, Han Wu, Tong Chen, Jonathan D. Denlinger, Sung-Kwan Mo, Makoto Hashimoto, Matteo Michiardi, Tor M. Pedersen, Sergey Gorovikov, Sergey Zhdanovich, Andrea Damascelli, Genda Gu, Pengcheng Dai, Jiun-Haw Chu, Donghui Lu, Qimiao Si, Robert J. Birgeneau, Ming Yi

Abstract: Electronic correlation is of fundamental importance to high temperature superconductivity. While the low energy electronic states in cuprates are dominantly affected by correlation effects across the phase diagram, observation of correlation-driven changes in fermiology amongst the iron-based superconductors remains rare. Here we present experimental evidence for a correlation-driven reconstructio… ▽ More Electronic correlation is of fundamental importance to high temperature superconductivity. While the low energy electronic states in cuprates are dominantly affected by correlation effects across the phase diagram, observation of correlation-driven changes in fermiology amongst the iron-based superconductors remains rare. Here we present experimental evidence for a correlation-driven reconstruction of the Fermi surface tuned independently by two orthogonal axes of temperature and Se/Te ratio in the iron chalcogenide family FeTe$_{1-x}$Se$_x$. We demonstrate that this reconstruction is driven by the de-hybridization of a strongly renormalized $d_{xy}$ orbital with the remaining itinerant iron 3$d$ orbitals in the emergence of an orbital-selective Mott phase. Our observations are further supported by our theoretical calculations to be salient spectroscopic signatures of such a non-thermal evolution from a strongly correlated metallic phase into an orbital-selective Mott phase in $d_{xy}$ as Se concentration is reduced. △ Less

Submitted 7 January, 2022; originally announced January 2022.

Comments: 25 pages, 5 figures, accepted version to appear in Communications Physics. arXiv admin note: text overlap with arXiv:2010.13913

Journal ref: Commun Phys 5, 29 (2022)

arXiv:2111.04146 [pdf, other]

Optimization of the Model Predictive Control Meta-Parameters Through Reinforcement Learning

Authors: Eivind Bøhn, Sebastien Gros, Signe Moe, Tor Arne Johansen

Abstract: Model predictive control (MPC) is increasingly being considered for control of fast systems and embedded applications. However, the MPC has some significant challenges for such systems. Its high computational complexity results in high power consumption from the control algorithm, which could account for a significant share of the energy resources in battery-powered embedded systems. The MPC param… ▽ More Model predictive control (MPC) is increasingly being considered for control of fast systems and embedded applications. However, the MPC has some significant challenges for such systems. Its high computational complexity results in high power consumption from the control algorithm, which could account for a significant share of the energy resources in battery-powered embedded systems. The MPC parameters must be tuned, which is largely a trial-and-error process that affects the control performance, the robustness and the computational complexity of the controller to a high degree. In this paper, we propose a novel framework in which any parameter of the control algorithm can be jointly tuned using reinforcement learning(RL), with the goal of simultaneously optimizing the control performance and the power usage of the control algorithm. We propose the novel idea of optimizing the meta-parameters of MPCwith RL, i.e. parameters affecting the structure of the MPCproblem as opposed to the solution to a given problem. Our control algorithm is based on an event-triggered MPC where we learn when the MPC should be re-computed, and a dual mode MPC and linear state feedback control law applied in between MPC computations. We formulate a novel mixture-distribution policy and show that with joint optimization we achieve improvements that do not present themselves when optimizing the same parameters in isolation. We demonstrate our framework on the inverted pendulum control task, reducing the total computation time of the control system by 36% while also improving the control performance by 18.4% over the best-performing MPC baseline. △ Less

Submitted 7 November, 2021; originally announced November 2021.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Showing 51–100 of 248 results for author: Moe, S