Skip to main content

Showing 1–50 of 247 results for author: Moe, S

.
  1. arXiv:2407.01552  [pdf

    cs.NI physics.optics

    High Spectral-Efficiency, Ultra-low MIMO SDM Transmission over a Field-Deployed Multi-Core OAM Fiber

    Authors: Junyi Liu, Zengquan Xu, Shuqi Mo, Yuming Huang, Yining Huang, Zhenhua Li, Yuying Guo, Lei Shen, Shuo Xu, Ran Gao, Cheng Du, Qian Feng, Jie Luo, Jie Liu, Siyuan Yu

    Abstract: Few-mode multi-core fiber (FM-MCF) based Space-Division Multiplexing (SDM) systems possess the potential to maximize the number of multiplexed spatial channels per fiber by harnessing both the space (fiber cores) and mode (optical mode per core) dimensions. However, to date, no SDM transmissions over field-deployed FM-MCFs in realistic outdoor settings have been reported, which contrasts with SDM… ▽ More

    Submitted 29 April, 2024; originally announced July 2024.

    Comments: 17 pages, 8 figures

  2. arXiv:2406.09386  [pdf, other

    cs.CV

    SimGen: Simulator-conditioned Driving Scene Generation

    Authors: Yunsong Zhou, Michael Simon, Zhenghao Peng, Sicheng Mo, Hongzi Zhu, Minyi Guo, Bolei Zhou

    Abstract: Controllable synthetic data generation can substantially lower the annotation cost of training data in autonomous driving research and development. Prior works use diffusion models to generate driving images conditioned on the 3D object layout. However, those models are trained on small-scale datasets like nuScenes, which lack appearance and layout diversity. Moreover, the trained models can only… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2406.08114  [pdf

    cond-mat.mes-hall cond-mat.str-el cond-mat.supr-con

    Massive 1D Dirac Line, Solitons and Reversible Manipulation on the Surface of a Prototype Obstructed Atomic Insulator, Silicon

    Authors: Zhongkai Liu, Peng Deng, Yuanfeng Xu, Haifeng Yang, Ding Pei, Cheng Chen, Shanmei He, Defa Liu, Sung-Kwan Mo, Timur Kim, Cephise Cacho, Hong Yao, Zhi-Da Song, Xi Chen, Zhong Wang, Binghai Yan, Lexian Yang, Bogdan A. Bernevig, Yulin Chen

    Abstract: Topologically trivial insulators can be classified into atomic insulators (AIs) and obstructed atomic insulators (OAIs) depending on whether the Wannier charge centers are localized or not at spatial positions occupied by atoms. An OAI can possess unusual properties such as surface states along certain crystalline surfaces, which advantageously appear in materials with much larger bulk energy gap… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2406.07540  [pdf, other

    cs.CV cs.LG

    Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

    Authors: Kuan Heng Lin, Sicheng Mo, Ben Klingher, Fangzhou Mu, Bolei Zhou

    Abstract: Recent controllable generation approaches such as FreeControl and Diffusion Self-guidance bring fine-grained spatial and appearance control to text-to-image (T2I) diffusion models without training auxiliary modules. However, these methods optimize the latent embedding for each type of score function with longer diffusion steps, making the generation process time-consuming and limiting their flexib… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 18 pages, 11 figures, see project page at https://genforce.github.io/ctrl-x

  5. arXiv:2406.05038  [pdf, other

    cs.CV cs.AI cs.LG

    Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs

    Authors: Shentong Mo

    Abstract: Recent advancements in sequence modeling have led to the development of the Mamba architecture, noted for its selective state space approach, offering a promising avenue for efficient long sequence handling. However, its application in 3D shape generation, particularly at high resolutions, remains underexplored. Traditional diffusion transformers (DiT) with self-attention mechanisms, despite their… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  6. arXiv:2406.04930  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers

    Authors: Tanvir Mahmud, Shentong Mo, Yapeng Tian, Diana Marculescu

    Abstract: Recent advances in pre-trained vision transformers have shown promise in parameter-efficient audio-visual learning without audio pre-training. However, few studies have investigated effective methods for aligning multimodal features in parameter-efficient audio-visual transformers. In this paper, we propose MA-AVT, a new parameter-efficient audio-visual transformer employing deep modality alignmen… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted in Efficient Deep Learning for Computer Vision CVPR Workshop 2024

  7. arXiv:2405.17995  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture

    Authors: Shentong Mo, Sukmin Yun

    Abstract: The joint-embedding predictive architecture (JEPA) recently has shown impressive results in extracting visual representations from unlabeled imagery under a masking strategy. However, we reveal its disadvantages, notably its insufficient understanding of local semantics. This deficiency originates from masked modeling in the embedding space, resulting in a reduction of discriminative power and can… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  8. arXiv:2405.15881  [pdf, other

    cs.CV cs.AI cs.LG

    Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation

    Authors: Shentong Mo, Yapeng Tian

    Abstract: In recent developments, the Mamba architecture, known for its selective state space approach, has shown potential in the efficient modeling of long sequences. However, its application in image generation remains underexplored. Traditional diffusion transformers (DiT), which utilize self-attention blocks, are effective but their computational complexity scales quadratically with the input length, l… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  9. arXiv:2405.07202  [pdf, other

    cs.CV cs.AI cs.LG cs.MM cs.SD eess.AS

    Unified Video-Language Pre-training with Synchronized Audio

    Authors: Shentong Mo, Haofan Wang, Huaxia Li, Xu Tang

    Abstract: Video-language pre-training is a typical and challenging problem that aims at learning visual and textual representations from large-scale data in a self-supervised way. Existing pre-training approaches either captured the correspondence of image-text pairs or utilized temporal ordering of frames. However, they do not explicitly explore the natural synchronization between audio and the other two m… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  10. arXiv:2404.17808  [pdf, other

    cs.CL

    Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal

    Authors: Haoran Lian, Yizhe Xiong, Jianwei Niu, Shasha Mo, Zhenpeng Su, Zijia Lin, Peng Liu, Hui Chen, Guiguang Ding

    Abstract: Byte Pair Encoding (BPE) serves as a foundation method for text tokenization in the Natural Language Processing (NLP) field. Despite its wide adoption, the original BPE algorithm harbors an inherent flaw: it inadvertently introduces a frequency imbalance for tokens in the text corpus. Since BPE iteratively merges the most frequent token pair in the text corpus while kee** all tokens that have be… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  11. arXiv:2404.13081  [pdf, other

    cs.CL cs.AI cs.LG

    SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs

    Authors: Jaehyung Kim, Jaehyun Nam, Sangwoo Mo, Jong** Park, Sang-Woo Lee, Minjoon Seo, Jung-Woo Ha, **woo Shin

    Abstract: Large language models (LLMs) have made significant advancements in various natural language processing tasks, including question answering (QA) tasks. While incorporating new information with the retrieval of relevant passages is a promising way to improve QA with LLMs, the existing methods often require additional fine-tuning which becomes infeasible with recent LLMs. Augmenting retrieved passage… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted at ICLR 2024

  12. arXiv:2404.12876  [pdf, other

    cs.CV cs.AI cs.LG

    A Large-scale Medical Visual Task Adaptation Benchmark

    Authors: Shentong Mo, Xufang Luo, Yansen Wang, Dongsheng Li

    Abstract: Visual task adaptation has been demonstrated to be effective in adapting pre-trained Vision Transformers (ViTs) to general downstream visual tasks using specialized learnable layers or tokens. However, there is yet a large-scale benchmark to fully explore the effect of visual task adaptation on the realistic and important medical domain, particularly across diverse medical visual modalities, such… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  13. arXiv:2404.11934  [pdf

    cond-mat.mtrl-sci cond-mat.mes-hall cond-mat.str-el

    Quantum simulation of honeycomb lattice model by high-order moiré pattern

    Authors: Qiang Wan, Chunlong Wu, Xun-Jiang Luo, Shenghao Dai, Cao Peng, Renzhe Li, Shangkun Mo, Keming Zhao, Wen-Xuan Qiu, Hao Zhong, Yiwei Li, Chendong Zhang, Fengcheng Wu, Nan Xu

    Abstract: Moiré superlattices have become an emergent solid-state platform for simulating quantum lattice models. However, in single moiré device, Hamiltonians parameters like lattice constant, hop** and interaction terms can hardly be manipulated, limiting the controllability and accessibility of moire quantum simulator. Here, by combining angle-resolved photoemission spectroscopy and theoretical analysi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 19 pages, 5 figure

    Journal ref: Phy. Rev. B 109, L161102 (2024)

  14. arXiv:2404.10308  [pdf, other

    cs.LG cs.AI

    Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs

    Authors: Woomin Song, Seunghyuk Oh, Sangwoo Mo, Jaehyung Kim, Sukmin Yun, Jung-Woo Ha, **woo Shin

    Abstract: Large language models (LLMs) have shown remarkable performance in various natural language processing tasks. However, a primary constraint they face is the context limit, i.e., the maximum number of tokens they can process. Previous works have explored architectural changes and modifications in positional encoding to relax the constraint, but they often require expensive training or do not address… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted to ICLR 2024. The first two authors contributed equally

  15. arXiv:2404.02257  [pdf, other

    cs.CV

    SnAG: Scalable and Accurate Video Grounding

    Authors: Fangzhou Mu, Sicheng Mo, Yin Li

    Abstract: Temporal grounding of text descriptions in videos is a central problem in vision-language learning and video understanding. Existing methods often prioritize accuracy over scalability -- they have been optimized for grounding only a few text queries within short videos, and fail to scale up to long videos with hundreds of queries. In this paper, we study the effect of cross-modal fusion on the sca… ▽ More

    Submitted 5 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024. Code available at https://github.com/fmu2/snag_release

  16. arXiv:2404.00509  [pdf, other

    cs.LG cs.CV

    DailyMAE: Towards Pretraining Masked Autoencoders in One Day

    Authors: Jiantao Wu, Shentong Mo, Sara Atito, Zhenhua Feng, Josef Kittler, Muhammad Awais

    Abstract: Recently, masked image modeling (MIM), an important self-supervised learning (SSL) method, has drawn attention for its effectiveness in learning data representation from unlabeled data. Numerous studies underscore the advantages of MIM, highlighting how models pretrained on extensive datasets can enhance the performance of downstream tasks. However, the high computational demands of pretraining po… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  17. arXiv:2403.13596  [pdf

    cond-mat.str-el cond-mat.mtrl-sci

    Tailoring Physical Properties of Crystals through Synthetic Temperature Control: A Case Study for new Polymorphic NbFeTe2 phases

    Authors: Hanlin Wu, Sheng Li, Yan Lyu, Yucheng Guo, Wenhao Liu, Ji Seop Oh, Yichen Zhang, Sung-Kwan Mo, Clarina dela Cruz, Robert J. Birgeneau, Keith M. Taddei, Ming Yi, Li Yang, Bing Lv

    Abstract: Growth parameters play a significant role in the crystal quality and physical properties of layered materials. Here we present a case study on a van der Waals magnetic NbFeTe2 material. Two different types of polymorphic NbFeTe2 phases, synthesized at different temperatures, display significantly different behaviors in crystal symmetry, electronic structure, electrical transport, and magnetism. Wh… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 22 Pages, 6 figures

  18. arXiv:2403.11416  [pdf

    cond-mat.mtrl-sci cond-mat.mes-hall cond-mat.str-el

    Surface region band enhancement in noble gas adsorption assisted ARPES on kagome superconductor RbV3Sb5

    Authors: Cao Peng, Yiwei Li, Xu Chen, Shenghao Dai, Zewen Wu, Chunlong Wu, Qiang Wan, Keming Zhao, Renzhe Li, Shangkun Mo, Dingkun Qin, Shuming Yu, Hao Zhong, Shengjun Yuan, Jiangang Guo, Nan Xu

    Abstract: Electronic states near surface regions can be distinct from bulk states, which are paramount in understanding various physical phenomena occurring at surfaces and in applications in semiconductors, energy, and catalysis. Here, we report an abnormal surface region band enhancement effect in angle-resolved photoemission spectroscopy on kagome superconductor RbV3Sb5, by depositing noble gases with fi… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: 17 pages,4 figures

    Journal ref: Phys. Rev. B 109, 115415 (2024)

  19. Electronic structure of above-room-temperature van der Waals ferromagnet Fe$_3$GaTe$_2$

    Authors: Ji-Eun Lee, Shaohua Yan, Sehoon Oh, **woong Hwang, Jonathan D. Denlinger, Choongyu Hwang, Hechang Lei, Sung-Kwan Mo, Se Young Park, Hye** Ryu

    Abstract: Fe$_3$GaTe$_2$, a recently discovered van der Waals ferromagnet, demonstrates intrinsic ferromagnetism above room temperature, necessitating a comprehensive investigation of the microscopic origins of its high Curie temperature ($\textit{T}$$_C$). In this study, we reveal the electronic structure of Fe$_3$GaTe$_2$ in its ferromagnetic ground state using angle-resolved photoemission spectroscopy an… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 25 pages, 4 figures

    Journal ref: Nano Lett. 23 (2023) 11526-11532

  20. arXiv:2403.07938  [pdf, other

    cs.SD cs.AI cs.CV cs.LG cs.MM eess.AS

    Text-to-Audio Generation Synchronized with Videos

    Authors: Shentong Mo, **g Shi, Yapeng Tian

    Abstract: In recent times, the focus on text-to-audio (TTA) generation has intensified, as researchers strive to synthesize audio from textual descriptions. However, most existing methods, though leveraging latent diffusion models to learn the correlation between audio and text embeddings, fall short when it comes to maintaining a seamless synchronization between the produced audio and its video. This often… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.12903

  21. arXiv:2403.05659  [pdf, other

    cs.CV

    Audio-Synchronized Visual Animation

    Authors: Lin Zhang, Shentong Mo, Yi**g Zhang, Pedro Morgado

    Abstract: Current visual generation methods can produce high quality videos guided by texts. However, effectively controlling object dynamics remains a challenge. This work explores audio as a cue to generate temporally synchronized image animations. We introduce Audio Synchronized Visual Animation (ASVA), a task animating a static image to demonstrate motion dynamics, temporally guided by audio clips acros… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 15 pages

  22. arXiv:2402.17406  [pdf, other

    cs.CV cs.AI cs.LG

    LSPT: Long-term Spatial Prompt Tuning for Visual Representation Learning

    Authors: Shentong Mo, Yansen Wang, Xufang Luo, Dongsheng Li

    Abstract: Visual Prompt Tuning (VPT) techniques have gained prominence for their capacity to adapt pre-trained Vision Transformers (ViTs) to downstream visual tasks using specialized learnable tokens termed as prompts. Contemporary VPT methodologies, especially when employed with self-supervised vision transformers, often default to the introduction of new learnable prompts or gated prompt tokens predominan… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  23. arXiv:2402.14299  [pdf, other

    cs.RO cs.AI

    We Choose to Go to Space: Agent-driven Human and Multi-Robot Collaboration in Microgravity

    Authors: Miao Xin, Zhongrui You, Zihan Zhang, Taoran Jiang, Tingjia Xu, Haotian Liang, Guo**g Ge, Yuchen Ji, Shentong Mo, Jian Cheng

    Abstract: We present SpaceAgents-1, a system for learning human and multi-robot collaboration (HMRC) strategies under microgravity conditions. Future space exploration requires humans to work together with robots. However, acquiring proficient robot skills and adept collaboration under microgravity conditions poses significant challenges within ground laboratories. To address this issue, we develop a microg… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  24. arXiv:2402.07143  [pdf, ps, other

    cond-mat.supr-con cond-mat.str-el

    Electronic structure of the alternating monolayer-trilayer phase of La3Ni2O7

    Authors: Sebastien N. Abadi, Ke-Jun Xu, Eder G. Lomeli, Pascal Puphal, Masahiko Isobe, Yong Zhong, Alexei V. Fedorov, Sung-Kwan Mo, Makoto Hashimoto, Dong-Hui Lu, Brian Moritz, Bernhard Keimer, Thomas P. Devereaux, Matthias Hepting, Zhi-Xun Shen

    Abstract: Recent studies of La$_3$Ni$_2$O$_7$ have identified a bilayer (2222) structure and an unexpected alternating monolayer-trilayer (1313) structure, both of which feature signatures of superconductivity near 80 K under high pressures. Using angle-resolved photoemission spectroscopy, we measure the electronic structure of 1313 samples. In contrast to the previously studied 2222 structure, we find that… ▽ More

    Submitted 25 June, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: Version 2: Improved data quality of the small electron pocket at the zone center ($ε$ band). Also, observations of multilayer splitting effects in the flat band ($γ$ and $δ$ bands) and in the large cuprate-like pockets ($β$ bands). Band structure calculations now use LDA instead of LDA+U. Main text: 7 pages, 3 figures. SM: 9 pages, 6 figures

  25. arXiv:2401.17607  [pdf, ps, other

    cond-mat.str-el cond-mat.mtrl-sci

    Engineering two-dimensional nodal semimetals in functionalized biphenylene by fluorine adatoms

    Authors: Seongjun Mo, Jaeuk Seo, Seok-Kyun Son, Sejoong Kim, Jun-Won Rhim, Hoonkyung Lee

    Abstract: We propose a new band engineering scheme on the biphenylene network, a newly synthesized carbon allotrope. First, we investigate the mechanism for the appearance of type II Dirac fermion in a pristine biphenylene network. We show that the essential ingredients are mirror symmetries and the stabilization of the compact localized eigenstates via destructive interference. While the former is used for… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  26. arXiv:2401.12969  [pdf, ps, other

    math.CO

    Monadic transductions and definable classes of matroids

    Authors: Susan Jowett, Dillon Mayhew, Songbao Mo, Christopher Tuffley

    Abstract: A transduction provides us with a way of using the monadic second-order language of a structure to make statements about a derived structure. Any transduction induces a relation on the set of these structures. This article presents a self-contained presentation of the theory of transductions for the monadic second-order language of matroids. This includes a proof of the matroid version of the Back… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  27. arXiv:2401.03494  [pdf

    cs.LG cs.CE physics.app-ph

    Pre-insertion resistors temperature prediction based on improved WOA-SVR

    Authors: Honghe Dai, Site Mo, Haoxin Wang, Nan Yin, Songhai Fan, Bixiong Li

    Abstract: The pre-insertion resistors (PIR) within high-voltage circuit breakers are critical components and warm up by generating Joule heat when an electric current flows through them. Elevated temperature can lead to temporary closure failure and, in severe cases, the rupture of PIR. To accurately predict the temperature of PIR, this study combines finite element simulation techniques with Support Vector… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  28. arXiv:2312.11732  [pdf, other

    cond-mat.str-el cond-mat.mtrl-sci

    Two-Step Electronic Response to Magnetic Ordering in a van der Waals Ferromagnet

    Authors: Han Wu, Jian-Xin Zhu, Lebing Chen, Matthew W Butcher, Ziqin Yue, Dongsheng Yuan, Yu He, Ji Seop Oh, Jianwei Huang, Shan Wu, Cheng Gong, Yucheng Guo, Sung-Kwan Mo, Jonathan D. Denlinger, Donghui Lu, Makoto Hashimoto, Matthew B. Stone, Alexander I. Kolesnikov, Songxue Chi, Junichiro Kono, Andriy H. Nevidomskyy, Robert J. Birgeneau, Pengcheng Dai, Ming Yi

    Abstract: The two-dimensional (2D) material Cr$_2$Ge$_2$Te$_6$ is a member of the class of insulating van der Waals magnets. Here, using high resolution angle-resolved photoemission spectroscopy in a detailed temperature dependence study, we identify a clear response of the electronic structure to a dimensional crossover in the form of two distinct temperature scales marking onsets of modifications in the e… ▽ More

    Submitted 20 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: PRB, in press

    Journal ref: Physical Review B 109, 045416 (2024)

  29. arXiv:2312.07536  [pdf, other

    cs.CV

    FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

    Authors: Sicheng Mo, Fangzhou Mu, Kuan Heng Lin, Yanli Liu, Bochen Guan, Yin Li, Bolei Zhou

    Abstract: Recent approaches such as ControlNet offer users fine-grained spatial control over text-to-image (T2I) diffusion models. However, auxiliary modules have to be trained for each type of spatial condition, model architecture, and checkpoint, putting them at odds with the diverse intents and preferences a human designer would like to convey to the AI models during the content creation process. In this… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Project Page: https://genforce.github.io/freecontrol/

  30. arXiv:2312.07231  [pdf, other

    cs.CV cs.AI cs.LG

    Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation

    Authors: Shentong Mo, Enze Xie, Yue Wu, Junsong Chen, Matthias Nießner, Zhenguo Li

    Abstract: Diffusion Transformers have recently shown remarkable effectiveness in generating high-quality 3D point clouds. However, training voxel-based diffusion models for high-resolution 3D voxels remains prohibitively expensive due to the cubic complexity of attention operators, which arises from the additional dimension of voxels. Motivated by the inherent redundancy of 3D compared to 2D, we propose Fas… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Project Page: https://dit-3d.github.io/FastDiT-3D/

  31. arXiv:2312.06220  [pdf, other

    cs.LG cs.AI

    Dance of Channel and Sequence: An Efficient Attention-Based Approach for Multivariate Time Series Forecasting

    Authors: Haoxin Wang, Yipeng Mo, Nan Yin, Honghe Dai, Bixiong Li, Songhai Fan, Site Mo

    Abstract: In recent developments, predictive models for multivariate time series analysis have exhibited commendable performance through the adoption of the prevalent principle of channel independence. Nevertheless, it is imperative to acknowledge the intricate interplay among channels, which fundamentally influences the outcomes of multivariate predictions. Consequently, the notion of channel independence,… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  32. arXiv:2312.01186  [pdf, other

    q-bio.BM

    Linker-Tuning: Optimizing Continuous Prompts for Heterodimeric Protein Prediction

    Authors: Shuxian Zou, Hui Li, Shentong Mo, Xingyi Cheng, Eric Xing, Le Song

    Abstract: Predicting the structure of interacting chains is crucial for understanding biological systems and develo** new drugs. Large-scale pre-trained Protein Language Models (PLMs), such as ESM2, have shown impressive abilities in extracting biologically meaningful representations for protein structure prediction. In this paper, we show that ESMFold, which has been successful in computing accurate atom… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  33. arXiv:2312.01118  [pdf, other

    cs.CV

    Beyond Accuracy: Statistical Measures and Benchmark for Evaluation of Representation from Self-Supervised Learning

    Authors: Jiantao Wu, Shentong Mo, Sara Atito, Josef Kittler, Zhenhua Feng, Muhammad Awais

    Abstract: Recently, self-supervised metric learning has raised attention for the potential to learn a generic distance function. It overcomes the limitations of conventional supervised one, e.g., scalability and label biases. Despite progress in this domain, current benchmarks, incorporating a narrow scope of classes, stop the nuanced evaluation of semantic representations. To bridge this gap, we introduce… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  34. arXiv:2312.01017  [pdf, other

    cs.CV cs.AI cs.LG cs.MM cs.SD

    Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling

    Authors: Shentong Mo, Pedro Morgado

    Abstract: Humans possess a remarkable ability to integrate auditory and visual information, enabling a deeper understanding of the surrounding environment. This early fusion of audio and visual cues, demonstrated through cognitive psychology and neuroscience research, offers promising potential for develo** multimodal perception models. However, training early fusion architectures poses significant challe… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  35. arXiv:2311.15080  [pdf, other

    cs.CV cs.AI cs.LG cs.MM cs.SD eess.AS

    Weakly-Supervised Audio-Visual Segmentation

    Authors: Shentong Mo, Bhiksha Raj

    Abstract: Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for sound sources in a video. Previous work applied a comprehensive manually designed architecture with countless pixel-wise accurate masks as supervision. However, these pixel-level masks are expensive and not available in all cases. In this work, we aim to simplify the supervision as the instance-level annotat… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  36. arXiv:2311.11285  [pdf, other

    cs.LG

    TimeSQL: Improving Multivariate Time Series Forecasting with Multi-Scale Patching and Smooth Quadratic Loss

    Authors: Site Mo, Haoxin Wang, Bixiong Li, Songhai Fan, Yuankai Wu, Xianggen Liu

    Abstract: Time series is a special type of sequence data, a sequence of real-valued random variables collected at even intervals of time. The real-world multivariate time series comes with noises and contains complicated local and global temporal dynamics, making it difficult to forecast the future time series given the historical observations. This work proposes a simple and effective framework, coined as… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

  37. arXiv:2311.06217  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things

    Authors: Shentong Mo, Paul Pu Liang, Russ Salakhutdinov, Louis-Philippe Morency

    Abstract: The Internet of Things (IoT), the network integrating billions of smart physical devices embedded with sensors, software, and communication technologies for the purpose of connecting and exchanging data with other devices and systems, is a critical and rapidly expanding component of our modern world. The IoT ecosystem provides a rich source of real-world modalities such as motion, thermal, geoloca… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  38. arXiv:2311.01144  [pdf, ps, other

    math.AG

    On Stable Rationality of Polytopes

    Authors: Simen Westbye Moe

    Abstract: Nicaise--Ottem introduced the notion of (stably) rational polytopes and studied this using a combinatorial description of the motivic volume. In this framework, we ask whether being non-stably rational is preserved under inclusions. We prove this holds for a large class of polytopes, leading to a combinatorial strategy for studying stable rationality of hypersurfaces in toric varieties. As a resul… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 34 pages

  39. arXiv:2310.18850  [pdf, other

    cs.CV cs.AI cs.LG

    Exploring Data Augmentations on Self-/Semi-/Fully- Supervised Pre-trained Models

    Authors: Shentong Mo, Zhun Sun, Chao Li

    Abstract: Data augmentation has become a standard component of vision pre-trained models to capture the invariance between augmented views. In practice, augmentation techniques that mask regions of a sample with zero/mean values or patches from other samples are commonly employed in pre-trained models with self-/semi-/fully-supervised contrastive losses. However, the underlying mechanism behind the effectiv… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  40. arXiv:2309.15371  [pdf

    cond-mat.mtrl-sci cond-mat.mes-hall cond-mat.str-el

    From Stoner to Local Moment Magnetism in Atomically Thin Cr2Te3

    Authors: Yong Zhong, Cheng Peng, Haili Huang, Dandan Guan, **woong Hwang, Kuan H. Hsu, Yi Hu, Chun**g Jia, Brian Moritz, Donghui Lu, Jun-Sik Lee, **-Feng Jia, Thomas P. Devereaux, Sung-Kwan Mo, Zhi-Xun Shen

    Abstract: The field of two-dimensional (2D) ferromagnetism has been proliferating over the past few years, with ongoing interests in basic science and potential applications in spintronic technology. However, a high-resolution spectroscopic study of the 2D ferromagnet is still lacking due to the small size and air sensitivity of the exfoliated nanoflakes. Here, we report a thickness-dependent ferromagnetism… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 32 pages, 4 + 10 figures

    Journal ref: Nature Communications 14, 5340 (2023)

  41. arXiv:2309.07694  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Tree of Uncertain Thoughts Reasoning for Large Language Models

    Authors: Shentong Mo, Miao Xin

    Abstract: While the recently introduced Tree of Thoughts (ToT) has heralded advancements in allowing Large Language Models (LLMs) to reason through foresight and backtracking for global decision-making, it has overlooked the inherent local uncertainties in intermediate decision points or "thoughts". These local uncertainties, intrinsic to LLMs given their potential for diverse responses, remain a significan… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  42. arXiv:2309.05281  [pdf, other

    cs.CV cs.LG cs.MM

    Class-Incremental Grou** Network for Continual Audio-Visual Learning

    Authors: Shentong Mo, Weiguo Pian, Yapeng Tian

    Abstract: Continual learning is a challenging problem in which models need to be trained on non-stationary data across sequential tasks for class-incremental learning. While previous methods have focused on using either regularization or rehearsal-based frameworks to alleviate catastrophic forgetting in image classification, they are limited to a single modality and cannot learn compact class-aware cross-mo… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: ICCV 2023. arXiv admin note: text overlap with arXiv:2303.17056

  43. arXiv:2308.11448  [pdf, other

    cs.CV cs.LG

    Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding

    Authors: Jiantao Wu, Shentong Mo, Muhammad Awais, Sara Atito, Zhenhua Feng, Josef Kittler

    Abstract: Self-supervised pretraining (SSP) has emerged as a popular technique in machine learning, enabling the extraction of meaningful feature representations without labelled data. In the realm of computer vision, pretrained vision transformers (ViTs) have played a pivotal role in advancing transfer learning. Nonetheless, the escalating cost of finetuning these large models has posed a challenge due to… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  44. arXiv:2308.11073  [pdf, other

    cs.CV

    Audio-Visual Class-Incremental Learning

    Authors: Weiguo Pian, Shentong Mo, Yunhui Guo, Yapeng Tian

    Abstract: In this paper, we introduce audio-visual class-incremental learning, a class-incremental learning scenario for audio-visual video recognition. We demonstrate that joint audio-visual modeling can improve class-incremental learning, but current methods fail to preserve semantic similarity between audio and visual features as incremental step grows. Furthermore, we observe that audio-visual correlati… ▽ More

    Submitted 14 October, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023

  45. arXiv:2307.12679  [pdf, other

    cs.LG math.NA

    An Estimator for the Sensitivity to Perturbations of Deep Neural Networks

    Authors: Naman Maheshwari, Nicholas Malaya, Scott Moe, Jaydeep P. Kulkarni, Sudhanva Gurumurthi

    Abstract: For Deep Neural Networks (DNNs) to become useful in safety-critical applications, such as self-driving cars and disease diagnosis, they must be stable to perturbations in input and model parameters. Characterizing the sensitivity of a DNN to perturbations is necessary to determine minimal bit-width precision that may be used to safely represent the network. However, no general result exists that i… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: Actual work and paper concluded in January 2019

  46. Antiferromagnetic topological insulating state in Tb$_{0.02}$Bi$_{1.08}$Sb$_{0.9}$Te$_2$S single crystals

    Authors: Lei Guo, Weiyao Zhao, Qile Li, Meng Xu, Lei Chen, Abdulhakim Bake, Thi-Hai-Yen Vu, Yahua He, Yong Fang, David Cortie, Sung-Kwan Mo, Mark Edmonds, Xiaolin Wang, Shuai Dong, Julie Karel, Ren-Kui Zheng

    Abstract: Topological insulators are emerging materials with insulating bulk and symmetry protected nontrivial surface states. One of the most fascinating transport behaviors in a topological insulator is the quantized anomalous Hall insulator, which has been observed inmagnetic-topological-insulator-based devices. In this work, we report a successful do** of rare earth element Tb into Bi$_{1.08}$Sb… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: 15 pages, 3 figures

    Journal ref: Physical Review B 107.12 (2023): 125125

  47. arXiv:2307.03154  [pdf, other

    cond-mat.str-el cond-mat.mtrl-sci

    Reversible Non-Volatile Electronic Switching in a Near Room Temperature van der Waals Ferromagnet

    Authors: Han Wu, Lei Chen, Paul Malinowski, Jianwei Huang, Qinwen Deng, Kirsty Scott, Bo Gyu Jang, Jacob P. C. Ruff, Yu He, Xiang Chen, Chaowei Hu, Ziqin Yue, Ji Seop Oh, Xiaokun Teng, Yucheng Guo, Mason Klemm, Chuqiao Shi, Yue Shi, Chandan Setty, Tyler Werner, Makoto Hashimoto, Donghui Lu, T. Yilmaz, Elio Vescovo, Sung-Kwan Mo , et al. (15 additional authors not shown)

    Abstract: The ability to reversibly toggle between two distinct states in a non-volatile method is important for information storage applications. Such devices have been realized for phase-change materials, which utilizes local heating methods to toggle between a crystalline and an amorphous state with distinct electrical properties. To expand such kind of switching between two topologically distinct phases… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Journal ref: Nat Commun 15, 2739 (2024)

  48. arXiv:2307.01831  [pdf, other

    cs.CV cs.AI cs.LG

    DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation

    Authors: Shentong Mo, Enze Xie, Ruihang Chu, Lewei Yao, Lanqing Hong, Matthias Nießner, Zhenguo Li

    Abstract: Recent Diffusion Transformers (e.g., DiT) have demonstrated their powerful effectiveness in generating high-quality 2D images. However, it is still being determined whether the Transformer architecture performs equally well in 3D shape generation, as previous 3D diffusion methods mostly adopted the U-Net architecture. To bridge this gap, we propose a novel Diffusion Transformer for 3D shape genera… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: Project Page: https://dit-3d.github.io/

  49. arXiv:2306.16329  [pdf, other

    cs.CV

    DiffComplete: Diffusion-based Generative 3D Shape Completion

    Authors: Ruihang Chu, Enze Xie, Shentong Mo, Zhenguo Li, Matthias Nießner, Chi-Wing Fu, Jiaya Jia

    Abstract: We introduce a new diffusion-based approach for shape completion on 3D range scans. Compared with prior deterministic and probabilistic methods, we strike a balance between realism, multi-modality, and high fidelity. We propose DiffComplete by casting shape completion as a generative task conditioned on the incomplete shape. Our key designs are two-fold. First, we devise a hierarchical feature agg… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Project Page: https://ruihangchu.com/diffcomplete.html

  50. arXiv:2306.14490  [pdf, other

    cs.CV cs.AI

    TaiChi Action Capture and Performance Analysis with Multi-view RGB Cameras

    Authors: Jianwei Li, Siyu Mo, Yanfei Shen

    Abstract: Recent advances in computer vision and deep learning have influenced the field of sports performance analysis for researchers to track and reconstruct freely moving humans without any marker attachment. However, there are few works for vision-based motion capture and intelligent analysis for professional TaiChi movement. In this paper, we propose a framework for TaiChi performance capture and anal… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.