Skip to main content

Showing 1–50 of 125 results for author: Fan, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08347  [pdf, other

    cs.RO

    Trajectory optimization of tail-sitter considering speed constraints

    Authors: Mingyue Fan, Fangfang Xie, Tingwei Ji, Yao Zheng

    Abstract: Tail-sitters, with the advantages of both the fixed-wing unmanned aerial vehicles (UAVs) and vertical take-off and landing UAVs, have been widely designed and researched in recent years. With the change in modern UAV application scenarios, it is required that UAVs have fast maneuverable three-dimensional flight capabilities. Due to the highly nonlinear aerodynamics produced by the fuselage and win… ▽ More

    Submitted 23 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2406.01159  [pdf, other

    cs.CV

    Dimba: Transformer-Mamba Diffusion Models

    Authors: Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Youqiang Zhang, Junshi Huang

    Abstract: This paper unveils Dimba, a new text-to-image diffusion model that employs a distinctive hybrid architecture combining Transformer and Mamba elements. Specifically, Dimba sequentially stacked blocks alternate between Transformer and Mamba layers, and integrate conditional information through the cross-attention layer, thus capitalizing on the advantages of both architectural paradigms. We investig… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  3. arXiv:2405.15780  [pdf, other

    cs.CV cs.LG

    Sequence Length Scaling in Vision Transformers for Scientific Images on Frontier

    Authors: Aristeidis Tsaris, Chengming Zhang, Xiao Wang, Junqi Yin, Siyan Liu, Moetasim Ashfaq, Ming Fan, Jong Youl Choi, Mohamed Wahib, Dan Lu, Prasanna Balaprakash, Feiyi Wang

    Abstract: Vision Transformers (ViTs) are pivotal for foundational models in scientific imagery, including Earth science applications, due to their capability to process large sequence lengths. While transformers for text has inspired scaling sequence lengths in ViTs, yet adapting these for ViTs introduces unique challenges. We develop distributed sequence parallelism for ViTs, enabling them to handle up to… ▽ More

    Submitted 17 April, 2024; originally announced May 2024.

  4. arXiv:2405.03782  [pdf, other

    cs.LG cs.HC

    Interpretable Data Fusion for Distributed Learning: A Representative Approach via Gradient Matching

    Authors: Mengchen Fan, Baocheng Geng, Keren Li, Xueqian Wang, Pramod K. Varshney

    Abstract: This paper introduces a representative-based approach for distributed learning that transforms multiple raw data points into a virtual representation. Unlike traditional distributed learning methods such as Federated Learning, which do not offer human interpretability, our method makes complex machine learning processes accessible and comprehensible. It achieves this by condensing extensive datase… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  5. arXiv:2404.14712  [pdf, other

    physics.ao-ph cs.AI cs.DC eess.IV physics.geo-ph

    ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability

    Authors: Xiao Wang, Aristeidis Tsaris, Siyan Liu, Jong-Youl Choi, Ming Fan, Wei Zhang, Junqi Yin, Moetasim Ashfaq, Dan Lu, Prasanna Balaprakash

    Abstract: Earth system predictability is challenged by the complexity of environmental dynamics and the multitude of variables involved. Current AI foundation models, although advanced by leveraging large and heterogeneous data, are often constrained by their size and data integration, limiting their effectiveness in addressing the full range of Earth system prediction challenges. To overcome these limitati… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  6. arXiv:2404.13358  [pdf, other

    cs.SD cs.AI eess.AS

    Music Consistency Models

    Authors: Zhengcong Fei, Mingyuan Fan, Junshi Huang

    Abstract: Consistency models have exhibited remarkable capabilities in facilitating efficient image/video generation, enabling synthesis with minimal sampling steps. It has proven to be advantageous in mitigating the computational burdens associated with diffusion models. Nevertheless, the application of consistency models in music generation remains largely unexplored. To address this gap, we present Music… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  7. arXiv:2404.04478  [pdf, other

    cs.CV

    Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models

    Authors: Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Junshi Huang

    Abstract: Transformers have catalyzed advancements in computer vision and natural language processing (NLP) fields. However, substantial computational complexity poses limitations for their application in long-context tasks, such as high-resolution image generation. This paper introduces a series of architectures adapted from the RWKV model used in the NLP, with requisite modifications tailored for diffusio… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  8. arXiv:2404.00502  [pdf, other

    cs.LG math.NA

    Conditional Pseudo-Reversible Normalizing Flow for Surrogate Modeling in Quantifying Uncertainty Propagation

    Authors: Minglei Yang, Pengjun Wang, Ming Fan, Dan Lu, Yanzhao Cao, Guannan Zhang

    Abstract: We introduce a conditional pseudo-reversible normalizing flow for constructing surrogate models of a physical model polluted by additive noise to efficiently quantify forward and inverse uncertainty propagation. Existing surrogate modeling approaches usually focus on approximating the deterministic component of physical model. However, this strategy necessitates knowledge of noise and resorts to a… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  9. arXiv:2403.16107  [pdf, other

    cs.HC

    Designing Upper-Body Gesture Interaction with and for People with Spinal Muscular Atrophy in VR

    Authors: **gze Tian, Yingna Wang, Keye Yu, Liyi Xu, Junan Xie, Franklin Mingzhe Li, Yafeng Niu, Mingming Fan

    Abstract: Recent research proposed gaze-assisted gestures to enhance interaction within virtual reality (VR), providing opportunities for people with motor impairments to experience VR. Compared to people with other motor impairments, those with Spinal Muscular Atrophy (SMA) exhibit enhanced distal limb mobility, providing them with more design space. However, it remains unknown what gaze-assisted upper-bod… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  10. arXiv:2403.09326  [pdf, other

    cs.GR cs.AI

    HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation

    Authors: Duotun Wang, Hengyu Meng, Zeyu Cai, Zhi**g Shao, Qianxi Liu, Lin Wang, Mingming Fan, Xiaohang Zhan, Zeyu Wang

    Abstract: We present HeadEvolver, a novel framework to generate stylized head avatars from text guidance. HeadEvolver uses locally learnable mesh deformation from a template head mesh, producing high-quality digital assets for detail-preserving editing and animation. To tackle the challenges of lacking fine-grained and semantic-aware local shape control in global deformation through Jacobians, we introduce… ▽ More

    Submitted 10 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 12 pages, 17 figures

    ACM Class: I.2.6; I.3.8

  11. Typist Experiment: an Investigation of Human-to-Human Dictation via Role-play to Inform Voice-based Text Authoring

    Authors: Can Liu, Siying Hu, Li Feng, Mingming Fan

    Abstract: Voice dictation is increasingly used for text entry, especially in mobile scenarios. However, the speech-based experience gets disrupted when users must go back to a screen and keyboard to review and edit the text. While existing dictation systems focus on improving transcription and error correction, little is known about how to support speech input for the entire text creation process, including… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Journal ref: Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 338 (November 2022), 33 pages

  12. To Reach the Unreachable: Exploring the Potential of VR Hand Redirection for Upper Limb Rehabilitation

    Authors: Peixuan Xiong, Yukai Zhang, Nandi Zhang, Shihan Fu, Xin Li, Yadan Zheng, **ni Zhou, Xiquan Hu, Mingming Fan

    Abstract: Rehabilitation therapies are widely employed to assist people with motor impairments in regaining control over their affected body parts. Nevertheless, factors such as fatigue and low self-efficacy can hinder patient compliance during extensive rehabilitation processes. Utilizing hand redirection in virtual reality (VR) enables patients to accomplish seemingly more challenging tasks, thereby bolst… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  13. arXiv:2403.05087  [pdf, other

    cs.GR cs.CV

    SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

    Authors: Zhi**g Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang

    Abstract: We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian Splatting embedded on a triangle mesh, which renders over 300 FPS on a modern GPU and 30 FPS on a mobile device. We disentangle the motion and appearance of a virtual human with explicit mesh geometry and implicit appearance modeling with Gaussian Splatting. The Gaussians are defined by barycentric… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: [CVPR 2024] Code and data are available at https://github.com/initialneil/SplattingAvatar

  14. LightSword: A Customized Virtual Reality Exergame for Long-Term Cognitive Inhibition Training in Older Adults

    Authors: Qiuxin Du, Zhen Song, Haiyan Jiang, Xiaoying Wei, Dongdong Weng, Mingming Fan

    Abstract: The decline of cognitive inhibition significantly impacts older adults' quality of life and well-being, making it a vital public health problem in today's aging society. Previous research has demonstrated that Virtual reality (VR) exergames have great potential to enhance cognitive inhibition among older adults. However, existing commercial VR exergames were unsuitable for older adults' long-term… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 23 pages

    Journal ref: Proceedings of the CHI Conference on Human Factors in Computing Systems 2024 (CHI '24)

  15. arXiv:2402.15723  [pdf, other

    cs.HC

    FetchAid: Making Parcel Lockers More Accessible to Blind and Low Vision People With Deep-learning Enhanced Touchscreen Guidance, Error-Recovery Mechanism, and AR-based Search Support

    Authors: Zhitong Guan, Zeyu Xiong, Mingming Fan

    Abstract: Parcel lockers have become an increasingly prevalent last-mile delivery method. Yet, a recent study revealed its accessibility challenges to blind and low-vision people (BLV). Informed by the study, we designed FetchAid, a standalone intelligent mobile app assisting BLV in using a parcel locker in real-time by integrating computer vision and augmented reality (AR) technologies. FetchAid first uses… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  16. arXiv:2402.15719  [pdf, other

    cs.HC

    "It Is Hard to Remove from My Eye": Design Makeup Residue Visualization System for Chinese Traditional Opera (Xiqu) Performers

    Authors: Zeyu Xiong, Shihan Fu, Yanying Zhu, Chenqing Zhu, Xiaojuan Ma, Mingming Fan

    Abstract: Chinese traditional opera (Xiqu) performers often experience skin problems due to the long-term use of heavy-metal-laden face paints. To explore the current skincare challenges encountered by Xiqu performers, we conducted an online survey (N=136) and semi-structured interviews (N=15) as a formative study. We found that incomplete makeup removal is the leading cause of human-induced skin problems,… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI '24), May 11-16, 2024, Honolulu, HI, USA

  17. arXiv:2402.05608  [pdf, other

    cs.CV cs.MM

    Scalable Diffusion Models with State Space Backbone

    Authors: Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huang

    Abstract: This paper presents a new exploration into a category of diffusion models built upon state space architecture. We endeavor to train diffusion models for image data, wherein the traditional U-Net backbone is supplanted by a state space backbone, functioning on raw patches or latent space. Given its notable efficacy in accommodating long-range dependencies, Diffusion State Space Models (DiS) are dis… ▽ More

    Submitted 28 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  18. arXiv:2402.04991  [pdf, other

    cs.HC

    Exploring the Opportunity of Augmented Reality (AR) in Supporting Older Adults Explore and Learn Smartphone Applications

    Authors: Xiaofu **, Wai Tong, Xiaoying Wei, Xian Wang, Emily Kuang, Xiaoyu Mo, Huamin Qu, Mingming Fan

    Abstract: The global aging trend compels older adults to navigate the evolving digital landscape, presenting a substantial challenge in mastering smartphone applications. While Augmented Reality (AR) holds promise for enhancing learning and user experience, its role in aiding older adults' smartphone app exploration remains insufficiently explored. Therefore, we conducted a two-phase study: (1) a workshop w… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  19. arXiv:2401.14121  [pdf, other

    cs.CV

    Incorporating Exemplar Optimization into Training with Dual Networks for Human Mesh Recovery

    Authors: Yongwei Nie, Mingxian Fan, Chengjiang Long, Qing Zhang, Jian Zhu, Xuemiao Xu

    Abstract: We propose a novel optimization-based human mesh recovery method from a single image. Given a test exemplar, previous approaches optimize the pre-trained regression network to minimize the 2D re-projection loss, which however suffer from over-/under-fitting problems. This is because the ``exemplar optimization'' at testing time has too weak relation to the pre-training process, and the exemplar op… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  20. arXiv:2401.05739  [pdf, other

    cs.SE cs.CR

    Cross-Inlining Binary Function Similarity Detection

    Authors: Ang Jia, Ming Fan, Xi Xu, Wuxia **, Haijun Wang, Ting Liu

    Abstract: Binary function similarity detection plays an important role in a wide range of security applications. Existing works usually assume that the query function and target function share equal semantics and compare their full semantics to obtain the similarity. However, we find that the function map** is more complex, especially when function inlining happens. In this paper, we will systematically… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted at ICSE 2024 (Second Cycle). Camera-ready version

  21. arXiv:2401.02037   

    cs.IT

    Simplified Information Geometry Approach for Massive MIMO-OFDM Channel Estimation -- Part II: Convergence Analysis

    Authors: Jiyuan Yang, Yan Chen, Mingrui Fan, Xiqi Gao, Xiang-Gen Xia, Dirk Slock

    Abstract: In Part II of this two-part paper, we prove the convergence of the simplified information geometry approach (SIGA) proposed in Part I. For a general Bayesian inference problem, we first show that the iteration of the common second-order natural parameter (SONP) is separated from that of the common first-order natural parameter (FONP). Hence, the convergence of the common SONP can be checked indepe… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: I'm merging the two parts of this paper (arXiv:arXiv:2401.02035 and arXiv:2401.02037). The combined paper will appear as v2 of arXiv:2401.02035. So I need to withdraw this paper

  22. arXiv:2401.02035  [pdf, ps, other

    cs.IT

    Efficient Information Geometry Approach for Massive MIMO-OFDM Channel Estimation

    Authors: Jiyuan Yang, Yan Chen, Mingrui Fan, An-An Lu, Wen Zhong, Xiqi Gao, Xiaohu You, Xiang-Gen Xia, Dirk Slock

    Abstract: We investigate the channel estimation for massive multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. We revisit the information geometry approach (IGA) for massive MIMO-OFDM channel estimation. By using the constant magnitude property of the entries of the measurement matrix, we find that the second-order natural parameters of the distributions on all th… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

  23. arXiv:2312.14611  [pdf, other

    cs.CV

    Tuning-Free Inversion-Enhanced Control for Consistent Image Editing

    Authors: Xiaoyue Duan, Shuhao Cui, Guoliang Kang, Baochang Zhang, Zhengcong Fei, Mingyuan Fan, Junshi Huang

    Abstract: Consistent editing of real images is a challenging task, as it requires performing non-rigid edits (e.g., changing postures) to the main objects in the input image without changing their identity or attributes. To guarantee consistent attributes, some existing methods fine-tune the entire model or the textual embedding for structural consistency, but they are time-consuming and fail to perform non… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  24. arXiv:2312.03987  [pdf, other

    cs.CL cs.AI

    Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration

    Authors: Meihao Fan, Xiaoyue Han, Ju Fan, Chengliang Chai, Nan Tang, Guoliang Li, Xiaoyong Du

    Abstract: Entity resolution (ER) is an important data integration task with a wide spectrum of applications. The state-of-the-art solutions on ER rely on pre-trained language models (PLMs), which require fine-tuning on a lot of labeled matching/non-matching entity pairs. Recently, large languages models (LLMs), such as GPT-4, have shown the ability to perform many tasks without tuning model parameters, whic… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 14 pages, 7 figures

  25. arXiv:2311.15830  [pdf, other

    cs.SD cs.CV eess.AS

    A-JEPA: Joint-Embedding Predictive Architecture Can Listen

    Authors: Zhengcong Fei, Mingyuan Fan, Junshi Huang

    Abstract: This paper presents that the masked-modeling principle driving the success of large foundational vision models can be effectively applied to audio by making predictions in a latent space. We introduce Audio-based Joint-Embedding Predictive Architecture (A-JEPA), a simple extension method for self-supervised learning from the audio spectrum. Following the design of I-JEPA, our A-JEPA encodes visibl… ▽ More

    Submitted 11 January, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: text overlap with arXiv:2207.06405 by other authors

  26. arXiv:2311.11269  [pdf, other

    cs.HC cs.MM

    OperARtistry: An AR-based Interactive Application to Assist the Learning of Chinese Traditional Opera (Xiqu) Makeup

    Authors: Zeyu Xiong, Shihan Fu, Mingming Fan

    Abstract: Chinese Traditional Opera (Xiqu) is an important type of intangible cultural heritage and one key characteristic of Xiqu is its visual effects on face achieved via makeup. However, Xiqu makeup process, especially the eye-area makeup process, is complex and time-consuming, which poses a learning challenge for potential younger inheritors. We introduce OperARtistry, an interactive application based… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: 11 pages, 9 figures, In Proceedings of The Eleventh International Symposium of Chinese CHI (Chinese CHI 2023)

  27. arXiv:2311.06423  [pdf, other

    cs.LG cs.CR cs.CV

    Flatness-aware Adversarial Attack

    Authors: Mingyuan Fan, Xiaodan Li, Cen Chen, Yinggui Wang

    Abstract: The transferability of adversarial examples can be exploited to launch black-box attacks. However, adversarial examples often present poor transferability. To alleviate this issue, by observing that the diversity of inputs can boost transferability, input regularization based methods are proposed, which craft adversarial examples by combining several transformed inputs. We reveal that input regula… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  28. CoPrompt: Supporting Prompt Sharing and Referring in Collaborative Natural Language Programming

    Authors: Li Feng, Ryan Yen, Yuzhe You, Mingming Fan, Jian Zhao, Zhicong Lu

    Abstract: Natural language (NL) programming has become more approachable due to the powerful code-generation capability of large language models (LLMs). This shift to using NL to program enhances collaborative programming by reducing communication barriers and context-switching among programmers from varying backgrounds. However, programmers may face challenges during prompt engineering in a collaborative s… ▽ More

    Submitted 1 March, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  29. uxSense: Supporting User Experience Analysis with Visualization and Computer Vision

    Authors: Andrea Batch, Yipeng Ji, Mingming Fan, Jian Zhao, Niklas Elmqvist

    Abstract: Analyzing user behavior from usability evaluation can be a challenging and time-consuming task, especially as the number of participants and the scale and complexity of the evaluation grows. We propose uxSense, a visual analytics system using machine learning methods to extract user behavior from audio and video recordings as parallel time-stamped data streams. Our implementation draws on pattern… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 21 pages, 14 figures

    Journal ref: IEEE Transactions on Visualization and Computer Graphics, 2023

  30. arXiv:2310.06959  [pdf, ps, other

    cs.PL

    Proof Repair across Quotient Type Equivalences

    Authors: Cosmo Viola, Max Fan, Talia Ringer

    Abstract: Proofs in proof assistants like Coq can be brittle, breaking easily in response to changes in the terms and types those proofs depend on. To address this, recent work introduced an algorithm and tool in Coq to automatically repair broken proofs in response to changes that correspond to type equivalences. However, many changes remained out of the scope of this algorithm and tool -- especially chang… ▽ More

    Submitted 18 March, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: for associated code, see https://github.com/InnovativeInventor/proof-repair-quotients

  31. arXiv:2310.04610  [pdf, other

    cs.AI cs.LG

    DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

    Authors: Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri , et al. (67 additional authors not shown)

    Abstract: In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique… ▽ More

    Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  32. arXiv:2309.11816  [pdf, other

    cs.HC

    Designing Loving-Kindness Meditation in Virtual Reality for Long-Distance Romantic Relationships

    Authors: Xian Wang, Xiaoyu Mo, Lik-Hang Lee, Xiaoying Wei, Xiaofu **, Mingming Fan, Pan Hui

    Abstract: Loving-kindness meditation (LKM) is used in clinical psychology for couples' relationship therapy, but physical isolation can make the relationship more strained and inaccessible to LKM. Virtual reality (VR) can provide immersive LKM activities for long-distance couples. However, no suitable commercial VR applications for couples exist to engage in LKM activities of long-distance. This paper organ… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  33. arXiv:2309.07408  [pdf, other

    cs.RO

    An Explicit Method for Fast Monocular Depth Recovery in Corridor Environments

    Authors: Yehao Liu, Ruoyan Xia, Xiaosu Xu, Zijian Wang, Yiqing Ya, Mingze Fan

    Abstract: Monocular cameras are extensively employed in indoor robotics, but their performance is limited in visual odometry, depth estimation, and related applications due to the absence of scale information.Depth estimation refers to the process of estimating a dense depth map from the corresponding input image, existing researchers mostly address this issue through deep learning-based approaches, yet the… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: 10 pages, 8 figures. arXiv admin note: text overlap with arXiv:2111.08600 by other authors

  34. Exploring the Opportunities of AR for Enriching Storytelling with Family Photos between Grandparents and Grandchildren

    Authors: Zisu Li, Li Feng, Chen Liang, Yuru Huang, Mingming Fan

    Abstract: Storytelling with family photos, as an important mode of reminiscence-based activities, can be instrumental in promoting intergenerational communication between grandparents and grandchildren by strengthening generation bonds and shared family values. Motivated by challenges that existing technology approaches encountered for improving intergenerational storytelling (e.g., the need to hold the tab… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 7, Issue 3

  35. arXiv:2308.16160  [pdf, other

    cs.CV eess.IV

    Occ$^2$Net: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions

    Authors: Miao Fan, Mingrui Chen, Chen Hu, Shuchang Zhou

    Abstract: Image matching is a fundamental and critical task in various visual applications, such as Simultaneous Localization and Map** (SLAM) and image retrieval, which require accurate pose estimation. However, most existing methods ignore the occlusion relations between objects caused by camera motion and scene structure. In this paper, we propose Occ$^2$Net, a novel image matching method that models o… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  36. arXiv:2308.09484  [pdf, other

    cs.OH

    An Efficient Early-breaking Estimation and Tree-splitting Missing RFID Tag Identification Protocol

    Authors: Lijuan Zhang, Mingqiu Fan, Chunni Yu, Lei Lei

    Abstract: Recent statistics have demonstrated that missing items have become the main cause of loss for retailers in inventory management. To quickly identify missing tags, traditional protocols adopt Aloha-based strategies which take a long time, especially when the number of tags is large. Among them, few works considered the effect of unexpected unknown tags on the missing tag identification process. Wit… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  37. arXiv:2308.05585  [pdf, other

    cs.AI

    Proximal Policy Optimization Actual Combat: Manipulating Output Tokenizer Length

    Authors: Miao Fan, Chen Hu, Shuchang Zhou

    Abstract: The Reinforcement Learning from Human Feedback (RLHF) plays a pivotal role in sha** the impact of large language models (LLMs), contributing significantly to controlling output toxicity and selecting output styles, particularly as LLMs often harbor misleading content, highlighting the urgency to align them with human values for secure AI systems. The RLHF, characterized by complexity, instabilit… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

  38. arXiv:2307.16680  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.CR cs.CV

    On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook

    Authors: Mingyuan Fan, Chengyu Wang, Cen Chen, Yang Liu, Jun Huang

    Abstract: Diffusion models and large language models have emerged as leading-edge generative models, revolutionizing various aspects of human life. However, the practical implementations of these models have also exposed inherent risks, bringing to the forefront their evil sides and sparking concerns regarding their trustworthiness. Despite the wealth of literature on this subject, a comprehensive survey sp… ▽ More

    Submitted 7 December, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: A big update, draft

  39. arXiv:2307.07916  [pdf, other

    cs.LG cs.CR cs.CV

    On the Robustness of Split Learning against Adversarial Attacks

    Authors: Mingyuan Fan, Cen Chen, Chengyu Wang, Wenmeng Zhou, Jun Huang

    Abstract: Split learning enables collaborative deep learning model training while preserving data privacy and model security by avoiding direct sharing of raw data and model details (i.e., sever and clients only hold partial sub-networks and exchange intermediate computations). However, existing research has mainly focused on examining its reliability for privacy protection, with little investigation into m… ▽ More

    Submitted 17 July, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: accepted by ECAI 2023, camera-ready version

  40. arXiv:2306.16139  [pdf, other

    cs.CR

    VERTICES: Efficient Two-Party Vertical Federated Linear Model with TTP-aided Secret Sharing

    Authors: Mingxuan Fan, Yilun **, Liu Yang, Zhenghang Ren, Kai Chen

    Abstract: Vertical Federated Learning (VFL) has emerged as one of the most predominant approaches for secure collaborative machine learning where the training data is partitioned by features among multiple parties. Most VFL algorithms primarily rely on two fundamental privacy-preserving techniques: Homomorphic Encryption (HE) and secure Multi-Party Computation (MPC). Though generally considered with stronge… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  41. arXiv:2304.05818  [pdf, other

    cs.CV

    Gradient-Free Textual Inversion

    Authors: Zhengcong Fei, Mingyuan Fan, Junshi Huang

    Abstract: Recent works on personalized text-to-image generation usually learn to bind a special token with specific subjects or styles of a few given images by tuning its embedding through gradient descent. It is natural to question whether we can optimize the textual inversions by only accessing the process of model inference. As only requiring the forward computation to determine the textual inversion ret… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  42. arXiv:2303.10441  [pdf, other

    cs.HC

    Enabling Voice-Accompanying Hand-to-Face Gesture Recognition with Cross-Device Sensing

    Authors: Zisu Li, Cheng Liang, Yuntao Wang, Yue Qin, Chun Yu, Yukang Yan, Mingming Fan, Yuanchun Shi

    Abstract: Gestures performed accompanying the voice are essential for voice interaction to convey complementary semantics for interaction purposes such as wake-up state and input modality. In this paper, we investigated voice-accompanying hand-to-face (VAHF) gestures for voice interaction. We targeted hand-to-face gestures because such gestures relate closely to speech and yield significant acoustic feature… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Comments: This paper has been accepted by ACM CHI 2023

  43. Collaboration with Conversational AI Assistants for UX Evaluation: Questions and How to Ask them (Voice vs. Text)

    Authors: Emily Kuang, Ehsan Jahangirzadeh Soure, Mingming Fan, Jian Zhao, Kristen Shinohara

    Abstract: AI is promising in assisting UX evaluators with analyzing usability tests, but its judgments are typically presented as non-interactive visualizations. Evaluators may have questions about test recordings, but have no way of asking them. Interactive conversational assistants provide a Q&A dynamic that may improve analysis efficiency and evaluator autonomy. To understand the full range of analysis-r… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23--28, 2023, Hamburg, Germany

  44. Enhancing Older Adults' Gesture Ty** Experience Using the T9 Keyboard on Small Touchscreen Devices

    Authors: Emily Kuang, Ruihuan Chen, Mingming Fan

    Abstract: Older adults increasingly adopt small-screen devices, but limited motor dexterity hinders their ability to type effectively. While a 9-key (T9) keyboard allocates larger space to each key, it is shared by multiple consecutive letters. Consequently, users must interrupt their gestures when ty** consecutive letters, leading to inefficiencies and poor user experience. Thus, we proposed a novel keyb… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23--28, 2023, Hamburg, Germany

  45. Bridging the Generational Gap: Exploring How Virtual Reality Supports Remote Communication Between Grandparents and Grandchildren

    Authors: Xiaoying Wei, Yizheng Gu, Emily Kuang, Xian Wang, Beiyan Cao, Xiaofu **, Mingming Fan

    Abstract: When living apart, grandparents and grandchildren often use audio-visual communication approaches to stay connected. However, these approaches seldom provide sufficient companionship and intimacy due to a lack of co-presence and spatial interaction, which can be fulfilled by immersive virtual reality (VR). To understand how grandparents and grandchildren might leverage VR to facilitate their remot… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23--28, 2023, Hamburg, Germany

  46. Sparkling Silence: Practices and Challenges of Livestreaming Among Deaf or Hard of Hearing Streamers

    Authors: Beiyan Cao, Changyang He, Muzhi Zhou, Mingming Fan

    Abstract: Understanding livestream platforms' accessibility challenges for minority groups, such as people with disabilities, is critical to increasing the diversity and inclusion of those platforms. While prior work investigated the experiences of streamers with vision or motor loss, little is known about the experiences of deaf or hard of hearing (DHH) streamers who must work with livestreaming platforms… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23--28, 2023, Hamburg, Germany

  47. arXiv:2302.13860  [pdf, other

    cs.CR cs.SE

    Do as You Say: Consistency Detection of Data Practice in Program Code and Privacy Policy in Mini-App

    Authors: Yin Wang, Ming Fan, Junfeng Liu, Junjie Tao, Wuxia **, Qi Xiong, Yuhao Liu, Qinghua Zheng, Ting Liu

    Abstract: Mini-app is an emerging form of mobile application that combines web technology with native capabilities. Its features, e.g., no need to download and no installation, have made it popular rapidly. However, privacy issues that violate the laws or regulations are breeding in the swiftly expanding mini-app ecosystem. The consistency between what the mini-app does about the data in the program code an… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  48. CoPracTter: Toward Integrating Personalized Practice Scenarios, Timely Feedback and Social Support into An Online Support Tool for Co** with Stuttering in China

    Authors: Feng Li, Zeyu Xiong, Xinyi Li, Mingming Fan

    Abstract: Stuttering is a speech disorder influencing over 70 million people worldwide, including 13 million in China. It causes low self-esteem among other detrimental effects on people who stutter (PwS). Although prior work has explored approaches to assist PwS, they primarily focused on western contexts. In our formative study, we found unique practices and challenges among Chinese PwS. We then iterative… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23--28, 2023, Hamburg, Germany

  49. "I am the follower, also the boss": Exploring Different Levels of Autonomy and Machine Forms of Guiding Robots for the Visually Impaired

    Authors: Yan Zhang, Ziang Li, Haole Guo, Luyao Wang, Qihe Chen, Wenjie Jiang, Mingming Fan, Guyue Zhou, Jiangtao Gong

    Abstract: Guiding robots, in the form of canes or cars, have recently been explored to assist blind and low vision (BLV) people. Such robots can provide full or partial autonomy when guiding. However, the pros and cons of different forms and autonomy for guiding robots remain unknown. We sought to fill this gap. We designed autonomy-switchable guiding robotic cane and car. We conducted a controlled lab-stud… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

  50. arXiv:2301.10412  [pdf, other

    cs.CL cs.AI cs.CR

    BDMMT: Backdoor Sample Detection for Language Models through Model Mutation Testing

    Authors: Jiali Wei, Ming Fan, Wen**g Jiao, Wuxia **, Ting Liu

    Abstract: Deep neural networks (DNNs) and natural language processing (NLP) systems have developed rapidly and have been widely used in various real-world fields. However, they have been shown to be vulnerable to backdoor attacks. Specifically, the adversary injects a backdoor into the model during the training phase, so that input samples with backdoor triggers are classified as the target class. Some atta… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.