Skip to main content

Showing 1–50 of 291 results for author: Cai, M

.
  1. arXiv:2406.20095  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

    Authors: Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, **ghuan Shang, Kanchana Ranasinghe, Ryan Burgert, Mu Cai, Yong Jae Lee, Michael S. Ryoo

    Abstract: Large Language Models (LLMs) equipped with extensive world knowledge and strong reasoning skills can tackle diverse tasks across domains, often by posing them as conversation-style instruction-response pairs. In this paper, we propose LLaRA: Large Language and Robotics Assistant, a framework which formulates robot action policy as conversations, and provides improved responses when trained with au… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.18020  [pdf, other

    cs.LG cs.AI physics.chem-ph

    MolFusion: Multimodal Fusion Learning for Molecular Representations via Multi-granularity Views

    Authors: Muzhen Cai, Sendong Zhao, Haochun Wang, Yanrui Du, Zewen Qiang, Bing Qin, Ting Liu

    Abstract: Artificial Intelligence predicts drug properties by encoding drug molecules, aiding in the rapid screening of candidates. Different molecular representations, such as SMILES and molecule graphs, contain complementary information for molecular encoding. Thus exploiting complementary information from different molecular representations is one of the research priorities in molecular encoding. Most ex… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  3. arXiv:2406.15160  [pdf, other

    eess.AS eess.SP

    Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios

    Authors: Ya Jiang, Qing Wang, Jun Du, Maocheng Hu, Pengfei Hu, Zeyan Liu, Shi Cheng, Zhaoxu Nian, Yuxuan Dong, Mingqi Cai, Xin Fang, Chin-Hui Lee

    Abstract: This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich c… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: accepted by icme2024

  4. arXiv:2406.13793  [pdf, other

    cs.HC

    Exploring the Optimal Time Window for Predicting Cognitive Load Using Physiological Sensor Data

    Authors: Minghao Cai, Carrie Demmans Epp

    Abstract: Learning analytics has begun to use physiological signals because these have been linked with learners' cognitive and affective states. These signals, when interpreted through machine learning techniques, offer a nuanced understanding of the temporal dynamics of student learning experiences and processes. However, there is a lack of clear guidance on the optimal time window to use for analyzing ph… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Presented at PhysioCHI: Towards Best Practices for Integrating Physiological Signals in HCI, May 11, 2024, Honolulu, HI, USA

  5. arXiv:2406.09400  [pdf, other

    cs.CV cs.LG

    Yo'LLaVA: Your Personalized Language and Vision Assistant

    Authors: Thao Nguyen, Haotian Liu, Yuheng Li, Mu Cai, Utkarsh Ojha, Yong Jae Lee

    Abstract: Large Multimodal Models (LMMs) have shown remarkable capabilities across a variety of tasks (e.g., image captioning, visual question answering). While broad, their knowledge remains generic (e.g., recognizing a dog), and they are unable to handle personalized subjects (e.g., recognizing a user's pet dog). Human reasoning, in contrast, typically operates within the context of specific subjects in o… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://thaoshibe.github.io/YoLLaVA

  6. arXiv:2406.03038  [pdf

    eess.SY

    Study on layout of double rotated serpentine springs for vertical-comb-driven torsional micromirror

    Authors: Biyun Ling, Yuhu Xia, Minli Cai, Xiaoyue Wang, Yaming Wu

    Abstract: The combination of double rotated serpentine springs (RSS) and vertical comb-drive is a suitbale solution for the development of torsional micromirror with high fill factor, low fabrication difficulty and good performance. However, the alignment error between upper and lower comb set caused by fabrication can induce force with unexpected direction. And the cross-axis coupled spring constants in do… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  7. arXiv:2406.02721  [pdf, other

    cs.CL cs.AI

    Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

    Authors: Min Cai, Yuchen Zhang, Shichang Zhang, Fan Yin, Difan Zou, Yisong Yue, Ziniu Hu

    Abstract: We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation pro… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 41 pages, 12 figures, 41 tables; Website: https://llm-self-control.github.io/

  8. arXiv:2405.20718  [pdf, other

    cs.IR cs.AI

    Popularity-Aware Alignment and Contrast for Mitigating Popularity Bias

    Authors: Miaomiao Cai, Lei Chen, Yifan Wang, Haoyue Bai, Peijie Sun, Le Wu, Min Zhang, Meng Wang

    Abstract: Collaborative Filtering (CF) typically suffers from the significant challenge of popularity bias due to the uneven distribution of items in real-world datasets. This bias leads to a significant accuracy gap between popular and unpopular items. It not only hinders accurate user preference understanding but also exacerbates the Matthew effect in recommendation systems. To alleviate popularity bias,… ▽ More

    Submitted 11 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  9. arXiv:2405.17430  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Matryoshka Multimodal Models

    Authors: Mu Cai, Jianwei Yang, Jianfeng Gao, Yong Jae Lee

    Abstract: Large Multimodal Models (LMMs) such as LLaVA have shown strong performance in visual-linguistic reasoning. These models first embed images into a fixed large number of visual tokens and then feed them into a Large Language Model (LLM). However, this design causes an excessive number of tokens for dense visual scenarios such as high-resolution images and videos, leading to great inefficiency. While… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Page: https://matryoshka-mm.github.io/

  10. Multimodality Invariant Learning for Multimedia-Based New Item Recommendation

    Authors: Haoyue Bai, Le Wu, Min Hou, Miaomiao Cai, Zhuangzhuang He, Yuyang Zhou, Richang Hong, Meng Wang

    Abstract: Multimedia-based recommendation provides personalized item suggestions by learning the content preferences of users. With the proliferation of digital devices and APPs, a huge number of new items are created rapidly over time. How to quickly provide recommendations for new items at the inference time is challenging. What's worse, real-world items exhibit varying degrees of modality missing(e.g., m… ▽ More

    Submitted 28 April, 2024; originally announced May 2024.

  11. arXiv:2405.10818  [pdf

    cs.SI

    Modeling Supply Chain Interaction and Disruption: Insights from Real-world Data and Complex Adaptive System

    Authors: Jiawei Feng, Mengsi Cai, Fangze Dai, Tianci Bu, Xiaoyu Zhang, Huijun Zheng, Xin Lu

    Abstract: In the rapidly evolving automotive industry, Systems-on-Chips (SoCs) are playing an increasingly crucial role in enhancing vehicle intelligence, connectivity, and safety features. For enterprises whose business encompasses automotive SoCs, the sustained and stable provision and receipt of SoC relevant goods or services are essential. Considering the imperative for a resilient and adaptable supply… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2304.10428 by other authors

  12. arXiv:2405.06670  [pdf, other

    cs.LO cs.LG

    TLINet: Differentiable Neural Network Temporal Logic Inference

    Authors: Danyang Li, Mingyu Cai, Cristian-Ioan Vasile, Roberto Tron

    Abstract: There has been a growing interest in extracting formal descriptions of the system behaviors from data. Signal Temporal Logic (STL) is an expressive formal language used to describe spatial-temporal properties with interpretability. This paper introduces TLINet, a neural-symbolic framework for learning STL formulas. The computation in TLINet is differentiable, enabling the usage of off-the-shelf gr… ▽ More

    Submitted 14 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  13. arXiv:2405.05543  [pdf, ps, other

    cs.HC

    Predicting Cognitive Load Using Sensor Data in a Literacy Game

    Authors: Minghao Cai, Carrie Demmans Epp

    Abstract: Educational games are being increasingly used to support self-paced learning. However, educators and system designers often face challenges in monitoring student affect and cognitive load. Existing assessments in game-based learning environments (GBLEs) tend to focus more on outcomes rather than processes, potentially overlooking key aspects of the learning journey that include learner affect and… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: This work has been accepted by the 17th International Conference on Educational Data Mining

  14. arXiv:2404.14673  [pdf, other

    quant-ph physics.optics

    High-Dimensional Two-Photon Quantum Controlled Phase-Flip Gate

    Authors: Mingyuan Chen, Jiangshan Tang, Miao Cai, Franco Nori, Keyu Xia

    Abstract: High-dimensional quantum systems have been used to reveal interesting fundamental physics and to improve information capacity and noise resilience in quantum information processing. However, it remains a significant challenge to realize universal two-photon quantum gates in high dimensions with high success probability. Here, by considering an ion-cavity QED system, we theoretically propose, to th… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  15. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  16. arXiv:2404.12860  [pdf, other

    quant-ph

    Nonreciprocal PT-symmetric phase transition in a non-Hermitian chiral quantum optical system

    Authors: Miao Cai, Jiang-Shan Tang, Ming-Yuan Chen, Keyu Xia

    Abstract: Phase transitions, non-Hermiticity and nonreciprocity play central roles in fundamental physics. However, the triple interplay of these three fields is of lack in the quantum domain. Here, we show nonreciprocal parity-time-symmetric phase transition in a non-Hermitian chiral quantum electrodynamical system, caused by the directional system dissipation. In remarkable contrast to previously reported… ▽ More

    Submitted 21 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: 6 pages, 4 figures

  17. arXiv:2404.11193  [pdf, other

    quant-ph

    Photonic indistinguishability characterization and optimization for cavity-based single-photon source

    Authors: Miao Cai, Mingyuan Chen, Jiangshan Tang, Keyu Xia

    Abstract: Indistinguishability of single photons from independent sources is critically important for scalable quantum technologies. We provide a comprehensive comparison of single-photon indistinguishability of different kinds of cavity quantum electrodynamics (CQED) systems by numerically simulating Hong-Ou-Mandel (HOM) two-photon interference. We find that the CQED system using nature atoms exhibit super… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  18. arXiv:2404.08121  [pdf, other

    math.CO math.AG

    Symmetric Tropical Rank 2 Matrices

    Authors: May Cai, Kisun Lee, Josephine Yu

    Abstract: We study the tropicalization of the space of symmetric rank two matrices. Analogously to the result of Markwig and Yu for general tropical rank two matrices, we show that it has a simplicial complex structure as the space of symmetric bicolored trees and that this simplicial complex is shellable. We also discuss some matroid structures arising from this space and present generating functions for t… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 20 pages, 8 figures

    MSC Class: 14T15

  19. arXiv:2404.00101  [pdf, other

    math.GT math.QA

    Quandle Action Quivers

    Authors: Mason Cai, Sam Nelson

    Abstract: Quandle Coloring Quivers are directed graph-valued invariants of classical and virtual knots and links associated to finite quandles. Quandle action quivers are subquivers of the full quandle coloring quiver associated to quandle actions by elements of the coloring quandle. These quivers provide a categorification of the quandle counting invariant associated to each element of the quandle. We obta… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: 8 pages

    MSC Class: 57K12

  20. arXiv:2403.19927  [pdf, ps, other

    math.NA

    Parameter choice strategies for regularized least squares approximation of noisy continuous functions on the unit circle

    Authors: Congpei An, Mou Cai

    Abstract: In this paper, we consider a trigonometric polynomial reconstruction of continuous periodic functions from their noisy values at equidistant nodes of the unit circle by a regularized least squares method. We indicate that the constructed trigonometric polynomial can be determined in explicit due to the exactness of trapezoidal rule. Then a concrete error bound is derived based on the estimation of… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  21. arXiv:2403.19770  [pdf, other

    cs.RO cs.AI cs.LG

    Hierarchical Deep Learning for Intention Estimation of Teleoperation Manipulation in Assembly Tasks

    Authors: Mingyu Cai, Karankumar Patel, Soshi Iba, Songpo Li

    Abstract: In human-robot collaboration, shared control presents an opportunity to teleoperate robotic manipulation to improve the efficiency of manufacturing and assembly processes. Robots are expected to assist in executing the user's intentions. To this end, robust and prompt intention estimation is needed, relying on behavioral observations. The framework presents an intention estimation technique at hie… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: ICRA 2024

  22. arXiv:2403.18348  [pdf, other

    cs.IR

    Sequential Recommendation with Latent Relations based on Large Language Model

    Authors: Shenghao Yang, Weizhi Ma, Peijie Sun, Qingyao Ai, Yiqun Liu, Mingchen Cai, Min Zhang

    Abstract: Sequential recommender systems predict items that may interest users by modeling their preferences based on historical interactions. Traditional sequential recommendation methods rely on capturing implicit collaborative filtering signals among items. Recent relation-aware sequential recommendation models have achieved promising performance by explicitly incorporating item relations into the modeli… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by SIGIR 2024

  23. arXiv:2403.18325  [pdf, other

    cs.IR

    Common Sense Enhanced Knowledge-based Recommendation with Large Language Model

    Authors: Shenghao Yang, Weizhi Ma, Peijie Sun, Min Zhang, Qingyao Ai, Yiqun Liu, Mingchen Cai

    Abstract: Knowledge-based recommendation models effectively alleviate the data sparsity issue leveraging the side information in the knowledge graph, and have achieved considerable performance. Nevertheless, the knowledge graphs used in previous work, namely metadata-based knowledge graphs, are usually constructed based on the attributes of items and co-occurring relations (e.g., also buy), in which the for… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by DASFAA 2024

  24. arXiv:2403.15388  [pdf, other

    cs.CV cs.AI cs.CL

    LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

    Authors: Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan

    Abstract: Large Multimodal Models (LMMs) have shown significant visual reasoning capabilities by connecting a visual encoder and a large language model. LMMs typically take in a fixed and large amount of visual tokens, such as the penultimate layer features in the CLIP visual encoder, as the prefix content. Recent LMMs incorporate more complex visual inputs, such as high-resolution images and videos, which… ▽ More

    Submitted 22 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: Project page: https://llava-prumerge.github.io/

  25. arXiv:2403.14125  [pdf, other

    stat.ML cs.LG

    Learning causal graphs using variable grou** according to ancestral relationship

    Authors: Ming Cai, Hisayuki Hara

    Abstract: Several causal discovery algorithms have been proposed. However, when the sample size is small relative to the number of variables, the accuracy of estimating causal graphs using existing methods decreases. And some methods are not feasible when the sample size is smaller than the number of variables. To circumvent these problems, some researchers proposed causal structure learning algorithms usin… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 12 pages, 5 figures

  26. arXiv:2403.04369  [pdf, other

    cs.AI cs.CL

    From Graph to Word Bag: Introducing Domain Knowledge to Confusing Charge Prediction

    Authors: Ang Li, Qiangchao Chen, Yiquan Wu, Ming Cai, Xiang Zhou, Fei Wu, Kun Kuang

    Abstract: Confusing charge prediction is a challenging task in legal AI, which involves predicting confusing charges based on fact descriptions. While existing charge prediction methods have shown impressive performance, they face significant challenges when dealing with confusing charges, such as Snatch and Robbery. In the legal domain, constituent elements play a pivotal role in distinguishing confusing c… ▽ More

    Submitted 24 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  27. arXiv:2403.04366  [pdf, other

    cs.AI

    Enhancing Court View Generation with Knowledge Injection and Guidance

    Authors: Ang Li, Yiquan Wu, Yifei Liu, Fei Wu, Ming Cai, Kun Kuang

    Abstract: Court View Generation (CVG) is a challenging task in the field of Legal Artificial Intelligence (LegalAI), which aims to generate court views based on the plaintiff claims and the fact descriptions. While Pretrained Language Models (PLMs) have showcased their prowess in natural language generation, their application to the complex, knowledge-intensive domain of CVG often reveals inherent limitatio… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  28. arXiv:2403.03790  [pdf, other

    cs.CV

    Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery

    Authors: Wei Zhang, Miaoxin Cai, Tong Zhang, Guoqiang Lei, Yin Zhuang, Xuerui Mao

    Abstract: Ship detection needs to identify ship locations from remote sensing (RS) scenes. Due to different imaging payloads, various appearances of ships, and complicated background interference from the bird's eye view, it is difficult to set up a unified paradigm for achieving multi-source ship detection. To address this challenge, in this article, leveraging the large language models (LLMs)'s powerful g… ▽ More

    Submitted 13 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  29. arXiv:2403.03730  [pdf, other

    cs.CV cs.AI cs.LG

    Learning 3D object-centric representation through prediction

    Authors: John Day, Tushar Arora, Jirui Liu, Li Erran Li, Ming Bo Cai

    Abstract: As part of human core knowledge, the representation of objects is the building block of mental representation that supports high-level concepts and symbolic reasoning. While humans develop the ability of perceiving objects situated in 3D environments without supervision, models that learn the same set of abilities with similar constraints faced by human infants are lacking. Towards this end, we de… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 21 pages, 11 figures. Project webpage can be found at https://jday54.github.io/opple_site/

    ACM Class: I.2.10; I.4.8; I.4.6; I.4.10; I.2.6

  30. arXiv:2403.02614  [pdf, other

    quant-ph physics.optics

    Generation of True Quantum Random Numbers with On-Demand Probability Distributions via Single-Photon Quantum Walks

    Authors: Chaoying Meng, Miao Cai, Yufang Yang, Haodong Wu, Zhixiang Li, Ya** Ruan, Yong Zhang, Han Zhang, Keyu Xia, Franco Nori

    Abstract: Random numbers are at the heart of diverse fields, ranging from simulations of stochastic processes to classical and quantum cryptography. The requirement for true randomness in these applications has motivated various proposals for generating random numbers based on the inherent randomness of quantum systems. The generation of true random numbers with arbitrarily defined probability distributions… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  31. arXiv:2403.01686  [pdf, other

    astro-ph.HE astro-ph.GA

    AT2023lli: A Tidal Disruption Event with Prominent Optical Early Bump and Delayed Episodic X-ray Emission

    Authors: Shifeng Huang, Ning Jiang, Jiazheng Zhu, Yibo Wang, Tinggui Wang, Shan-Qin Wang, Wen-Pei Gan, En-Wei Liang, Yu-**g Qin, Zheyu Lin, Lin-Na Xu, Min-Xuan Cai, Ji-An Jiang, Xu Kong, Jiaxun Li, Long Li, Jian-Guo Wang, Ze-Lin Xu, Yongquan Xue, Ye-Fei Yuan, **gquan Cheng, Lulu Fan, Jie Gao, Lei Hu, Weida Hu , et al. (20 additional authors not shown)

    Abstract: High-cadence, multiwavelength observations have continuously revealed the diversity of tidal disruption events (TDEs), thus greatly advancing our knowledge and understanding of TDEs. In this work, we conducted an intensive optical-UV and X-ray follow-up campaign of TDE AT2023lli, and found a remarkable month-long bump in its UV/optical light curve nearly two months prior to maximum brightness. The… ▽ More

    Submitted 26 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: 14 pages, 8 figures,accepted for publication by ApJL

  32. arXiv:2402.18166  [pdf, other

    cs.IR

    Sequence-level Semantic Representation Fusion for Recommender Systems

    Authors: Lanling Xu, Zhen Tian, Bingqian Li, Junjie Zhang, **peng Wang, Mingchen Cai, Wayne Xin Zhao

    Abstract: With the rapid development of recommender systems, there is increasing side information that can be employed to improve the recommendation performance. Specially, we focus on the utilization of the associated \emph{textual data} of items (eg product title) and study how text features can be effectively fused with ID features in sequential recommendation. However, there exists distinct data charact… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 8 pages, 5 figures

  33. arXiv:2402.13254  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

    Authors: Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee

    Abstract: We propose CounterCurate, a framework to comprehensively improve the visio-linguistic compositional reasoning capability for both contrastive and generative multimodal models. In particular, we identify two critical under-explored problems: the neglect of the physically grounded reasoning (counting and position understanding) and the potential of using highly capable text and image generation mode… ▽ More

    Submitted 12 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 15 pages, 6 figures, 12 tables, Project Page: https://countercurate.github.io/

  34. arXiv:2402.13194  [pdf, other

    quant-ph cs.IT

    Quantum Wiretap Channel Coding Assisted by Noisy Correlation

    Authors: Minglai Cai, Andreas Winter

    Abstract: We consider the private classical capacity of a quantum wiretap channel, where the users (sender Alice, receiver Bob, and eavesdropper Eve) have access to the resource of a shared quantum state, additionally to their channel inputs and outputs. An extreme case is maximal entanglement or a secret key between Alice and Bob, both of which would allow for onetime padding the message. But here both the… ▽ More

    Submitted 11 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Journal ref: Proc. ISIT 2024, Athens (Greece), 7-12 July 2024

  35. arXiv:2402.01137  [pdf, ps, other

    math.PR math.NA

    Long-time dynamics of stochastic wave equation with dissipative dam** and its full discretization: exponential ergodicity and strong law of large numbers

    Authors: Meng Cai, Chuchu Chen, Jialin Hong, Tau Zhou

    Abstract: For stochastic wave equation, when the dissipative dam** is a non-globally Lipschitz function of the velocity, there are few results on the long-time dynamics, in particular, the exponential ergodicity and strong law of large numbers, for the equation and its numerical discretization to our knowledge. Focus on this issue, the main contributions of this paper are as follows. First, based on const… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  36. arXiv:2401.16822  [pdf, other

    cs.CV

    EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain

    Authors: Wei Zhang, Miaoxin Cai, Tong Zhang, Yin Zhuang, Xuerui Mao

    Abstract: Multi-modal large language models (MLLMs) have demonstrated remarkable success in vision and visual-language tasks within the natural image domain. Owing to the significant diversities between the natural and remote sensing (RS) images, the development of MLLMs in the RS domain is still in the infant stage. To fill the gap, a pioneer MLLM named EarthGPT integrating various multi-sensor RS interpre… ▽ More

    Submitted 8 March, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  37. arXiv:2401.11613  [pdf, other

    astro-ph.EP astro-ph.GA astro-ph.SR

    Hot Jupiter Formation in Dense Star Clusters

    Authors: Leonard Benkendorff, Francesco Flammini Dotti, Katja Stock, Maxwell Xu Cai, Rainer Spurzem

    Abstract: Hot Jupiters (HJ) are defined as Jupiter-mass exoplanets orbiting around their host star with an orbital period < 10 days. It is assumed that HJ do not form in-situ but ex-situ. Recent discoveries show that star clusters contribute to the formation of HJ. We present direct $N$-body simulations of planetary systems in star clusters and analyze the formation of HJ in them. We combine two direct $N$-… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

  38. arXiv:2401.04997  [pdf, other

    cs.IR

    Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis

    Authors: Lanling Xu, Junjie Zhang, Bingqian Li, **peng Wang, Mingchen Cai, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: Recently, large language models such as ChatGPT have showcased remarkable abilities in solving general tasks, demonstrating the potential for applications in recommender systems. To assess how effectively LLMs can be used in recommendation tasks, our study primarily focuses on employing LLMs as recommender systems through prompting engineering. We propose a general framework for utilizing LLMs in… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 40 pages, under review

  39. arXiv:2401.01731  [pdf, other

    quant-ph physics.optics

    Extracting double-quantum coherence in two-dimensional electronic spectroscopy under pump-probe geometry

    Authors: Mao-Rui Cai, Xue Zhang, Zi-Qian Cheng, Teng-Fei Yan, Hui Dong

    Abstract: Two-dimensional electronic spectroscopy (2DES) can be implemented with different geometries, e.g., BOXCARS, collinear and pump-probe geometries. The pump-probe geometry has its advantage of overlap** only two beams and reducing phase cycling steps. However, its applications are typically limited to observe the dynamics with single-quantum coherence and population, leaving the challenge to measur… ▽ More

    Submitted 1 March, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: 7 pages, 5 figures

  40. arXiv:2312.15154  [pdf, other

    math.ST math.AG

    Completions to Discrete Probability Distributions in Log-linear Models

    Authors: May Cai, Cecilie Olesen Recke, Thomas Yahl

    Abstract: Completion problems, of recovering a point from a set of observed coordinates, are abundant in applications to image reconstruction, phylogenetics, and data science. We consider a completion problem coming from algebraic statistics: to describe the completions of a point to a probability distribution lying in a given log-linear model. When there are finitely many completions, we show that these po… ▽ More

    Submitted 13 February, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: This work was conducted as a part of the Algebraic Statistics and Our Changing World Program hosted by the Institute for Mathematical and Statistical Innovation (IMSI). 21 pages, 5 figures

  41. arXiv:2312.10897  [pdf, other

    cs.CL cs.AI cs.LG

    Generalized Category Discovery with Large Language Models in the Loop

    Authors: Wenbin An, Wenkai Shi, Feng Tian, Haonan Lin, QianYing Wang, Yaqiang Wu, Mingxiang Cai, Luyan Wang, Yan Chen, Hai** Zhu, ** Chen

    Abstract: Generalized Category Discovery (GCD) is a crucial task that aims to recognize both known and novel categories from a set of unlabeled data by utilizing a few labeled data with only known categories. Due to the lack of supervision and category information, current methods usually perform poorly on novel categories and struggle to reveal semantic meanings of the discovered clusters, which limits the… ▽ More

    Submitted 26 May, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted by ACL 2024 Findings, code and data are available at https://github.com/Lackel/LOOP

  42. arXiv:2312.08153  [pdf, other

    physics.comp-ph cs.LG

    $ρ$-Diffusion: A diffusion-based density estimation framework for computational physics

    Authors: Maxwell X. Cai, Kin Long Kelvin Lee

    Abstract: In physics, density $ρ(\cdot)$ is a fundamentally important scalar function to model, since it describes a scalar field or a probability density function that governs a physical process. Modeling $ρ(\cdot)$ typically scales poorly with parameter space, however, and quickly becomes prohibitively difficult and computationally expensive. One promising avenue to bypass this is to leverage the capabili… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: 6 pages, 2 figures, accepted for publication at the NeurIPS 2023 workshop "Machine Learning and the Physical Sciences"

  43. arXiv:2312.00784  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

    Authors: Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee

    Abstract: While existing large vision-language multimodal models focus on whole image understanding, there is a prominent gap in achieving region-specific comprehension. Current approaches that use textual coordinates or spatial encodings often fail to provide a user-friendly interface for visual prompting. To address this challenge, we introduce a novel multimodal model capable of decoding arbitrary visual… ▽ More

    Submitted 26 April, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR2024. Project page: https://vip-llava.github.io/

  44. arXiv:2311.17318  [pdf

    cs.CY

    Impact of Indoor Mobility Behavior on the Respiratory Infectious Diseases Transmission Trends

    Authors: Ziwei Cui, Ming Cai, Zheng Zhu, Gongbo Chen, Yao Xiao

    Abstract: The importance of indoor human mobility in the transmission dynamics of respiratory infectious diseases has been acknowledged. Previous studies have predominantly addressed a single type of mobility behavior such as queueing and a series of behaviors under specific scenarios. However, these studies ignore the abstraction of mobility behavior in various scenes and the critical examination of how th… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  45. arXiv:2311.01487  [pdf, other

    cs.CV cs.CL

    What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning

    Authors: Yifan Du, Hangyu Guo, Kun Zhou, Wayne Xin Zhao, **peng Wang, Chuyuan Wang, Mingchen Cai, Ruihua Song, Ji-Rong Wen

    Abstract: Visual instruction tuning is an essential approach to improving the zero-shot generalization capability of Multi-modal Large Language Models (MLLMs). A surge of visual instruction datasets with various focuses and characteristics have been proposed recently, enabling MLLMs to achieve surprising results on evaluation benchmarks. To develop more capable MLLMs, in this paper, we aim to investigate a… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Work in progress

  46. arXiv:2310.20398  [pdf, other

    astro-ph.EP astro-ph.IM cs.LG physics.comp-ph

    A hybrid approach for solving the gravitational N-body problem with Artificial Neural Networks

    Authors: Veronica Saz Ulibarrena, Philipp Horn, Simon Portegies Zwart, Elena Sellentin, Barry Koren, Maxwell X. Cai

    Abstract: Simulating the evolution of the gravitational N-body problem becomes extremely computationally expensive as N increases since the problem complexity scales quadratically with the number of bodies. We study the use of Artificial Neural Networks (ANNs) to replace expensive parts of the integration of planetary systems. Neural networks that include physical knowledge have grown in popularity in the l… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: Accepted for publication in the Journal of Computational Physics

  47. arXiv:2310.15457  [pdf, ps, other

    math.NA

    An Unconditionally Stable Iterative Decoupled Algorithm for Multiple-Network Poroelasticity

    Authors: Meng Lei, Mingchao Cai, Feng Wang

    Abstract: In this work, we introduce an iterative decoupled algorithm designed for addressing the quasi-static multiple-network poroelasticity problem. This problem pertains to the simultaneous modeling of fluid flow and deformations within an elastic porous medium permeated by multiple fluid networks, each with distinct characteristics. Our approach focuses on the total-pressure-based formulation, which tr… ▽ More

    Submitted 27 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: to be submitted

  48. arXiv:2310.13610  [pdf, other

    cs.CL cs.AI

    Make Your Decision Convincing! A Unified Two-Stage Framework: Self-Attribution and Decision-Making

    Authors: Yanrui Du, Sendong Zhao, Haochun Wang, Yuhan Chen, Rui Bai, Zewen Qiang, Muzhen Cai, Bing Qin

    Abstract: Explaining black-box model behavior with natural language has achieved impressive results in various NLP tasks. Recent research has explored the utilization of subsequences from the input text as a rationale, providing users with evidence to support the model decision. Although existing frameworks excel in generating high-quality rationales while achieving high task performance, they neglect to ac… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  49. arXiv:2310.05405  [pdf, other

    nucl-ex

    Subthreshold production of $J/ψ$ mesons from the deuteron with SoLID

    Authors: T. Liu, Z. W. Zhao, M. Cai, D. Byer, H. Gao

    Abstract: The electro- and photo-production of $J/ψ$ meson near the threshold from the proton is relevant to the search of hidden charm pentaquark candidates reported by the LHCb collaboration, and the study of the QCD trace anomaly's contribution to the proton mass. It is also expected to be sensitive to the QCD van der Waals interaction, that is mediated by multi-gluon exchanges and expected to dominate t… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  50. arXiv:2310.05036  [pdf, other

    cs.AI cs.CL

    AvalonBench: Evaluating LLMs Playing the Game of Avalon

    Authors: Jonathan Light, Min Cai, Sheng Shen, Ziniu Hu

    Abstract: In this paper, we explore the potential of Large Language Models (LLMs) Agents in playing the strategic social deduction game, Resistance Avalon. Players in Avalon are challenged not only to make informed decisions based on dynamically evolving game phases, but also to engage in discussions where they must deceive, deduce, and negotiate with other players. These characteristics make Avalon a compe… ▽ More

    Submitted 8 November, 2023; v1 submitted 8 October, 2023; originally announced October 2023.