Skip to main content

Showing 1–50 of 153 results for author: Guo, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00016  [pdf, other

    cs.DC

    AdaBridge: Dynamic Data and Computation Reuse for Efficient Multi-task DNN Co-evolution in Edge Systems

    Authors: Lehao Wang, Zhiwen Yu, Sicong Liu, Chenshu Wu, Xiangrui Xu, Bin Guo

    Abstract: Running multi-task DNNs on mobiles is an emerging trend for various applications like autonomous driving and mobile NLP. Mobile DNNs are often compressed to fit the limited resources and thus suffer from degraded accuracy and generalizability due to data drift. DNN evolution, e.g., continuous learning and domain adaptation, has been demonstrated effective in overcoming these issues, mostly for sin… ▽ More

    Submitted 2 May, 2024; originally announced July 2024.

    Comments: Accepted by NSDI'24 Poster

  2. arXiv:2406.17580  [pdf, other

    cs.DC

    Experimental Evaluation of Distributed k-Core Decomposition

    Authors: Bin Guo, Runze Zhao

    Abstract: Given an undirected graph, the $k$-core is a subgraph in which each node has at least $k$ connections, which is widely used in graph analytics to identify core subgraphs within a larger graph. The sequential $k$-core decomposition algorithm faces limitations due to memory constraints and data graphs can be inherently distributed. A distributed approach is proposed to overcome limitations by allowi… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.14217  [pdf, other

    cs.LG cs.CR

    Defending Against Sophisticated Poisoning Attacks with RL-based Aggregation in Federated Learning

    Authors: Yu**g Wang, Hainan Zhang, Sijia Wen, Wangjie Qiu, Binghui Guo

    Abstract: Federated learning is highly susceptible to model poisoning attacks, especially those meticulously crafted for servers. Traditional defense methods mainly focus on updating assessments or robust aggregation against manually crafted myopic attacks. When facing advanced attacks, their defense stability is notably insufficient. Therefore, it is imperative to develop adaptive defenses against such adv… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2406.09397  [pdf, other

    cs.CV cs.AI

    Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

    Authors: Miaosen Zhang, Yixuan Wei, Zhen Xing, Yifei Ma, Zuxuan Wu, Ji Li, Zheng Zhang, Qi Dai, Chong Luo, Xin Geng, Baining Guo

    Abstract: Modern vision models are trained on very large noisy datasets. While these models acquire strong capabilities, they may not follow the user's intent to output the desired results in certain aspects, e.g., visual aesthetic, preferred style, and responsibility. In this paper, we target the realm of visual aesthetics and aim to align vision models with human aesthetic standards in a retrieval system.… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 28 pages, 26 figures, under review

  5. arXiv:2405.16886  [pdf, other

    cs.CV

    Hawk: Learning to Understand Open-World Video Anomalies

    Authors: Jiaqi Tang, Hao Lu, Ruizheng Wu, Xiaogang Xu, Ke Ma, Cheng Fang, Bin Guo, Jiangbo Lu, Qifeng Chen, Ying-Cong Chen

    Abstract: Video Anomaly Detection (VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs. However, current VAD systems are often limited by their superficial semantic understanding of scenes and minimal user interaction. Additionally, the prevalent data scarcity in existing datasets restricts their applicability in open-world scenarios. In t… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  6. arXiv:2405.01851  [pdf, other

    cs.LG cs.AI

    Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

    Authors: Sicong Liu, Wentao Zhou, Zimu Zhou, Bin Guo, Minfan Wang, Cheng Fang, Zheng Lin, Zhiwen Yu

    Abstract: There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been e… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  7. arXiv:2404.19209  [pdf, other

    cs.DC

    AdaOper: Energy-efficient and Responsive Concurrent DNN Inference on Mobile Devices

    Authors: Zheng Lin, Bin Guo, Sicong Liu, Wentao Zhou, Yasan Ding, Yu Zhang, Zhiwen Yu

    Abstract: Deep neural network (DNN) has driven extensive applications in mobile technology. However, for long-running mobile apps like voice assistants or video applications on smartphones, energy efficiency is critical for battery-powered devices. The rise of heterogeneous processors in mobile devices today has introduced new challenges for optimizing energy efficiency. Our key insight is that partitioning… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  8. arXiv:2404.13033  [pdf, other

    cs.CL

    Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs

    Authors: Biyang Guo, He Wang, Wenyilin Xiao, Hong Chen, Zhuxin Lee, Songqiao Han, Hailiang Huang

    Abstract: In the burgeoning field of Large Language Models (LLMs) like ChatGPT and LLaMA, Prompt Engineering (PE) is renowned for boosting zero-shot or in-context learning (ICL) through prompt modifications. Yet, the realm of the sample design for downstream fine-tuning, crucial for task-specific LLM adaptation, is largely unexplored. This paper introduces Sample Design Engineering (SDE), a methodical appro… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 23 pages, 12 figures, 14 tables

  9. arXiv:2404.10667  [pdf, other

    cs.CV

    VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

    Authors: Sicheng Xu, Guojun Chen, Yu-Xiao Guo, Jiaolong Yang, Chong Li, Zhenyu Zang, Yizhong Zhang, Xin Tong, Baining Guo

    Abstract: We introduce VASA, a framework for generating lifelike talking faces with appealing visual affective skills (VAS) given a single static image and a speech audio clip. Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the percep… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Tech Report. Project webpage: https://www.microsoft.com/en-us/research/project/vasa-1/

  10. arXiv:2404.04481  [pdf, other

    cs.IR cs.AI cs.LG

    Joint Identifiability of Cross-Domain Recommendation via Hierarchical Subspace Disentanglement

    Authors: **g Du, Zesheng Ye, Bin Guo, Zhiwen Yu, Lina Yao

    Abstract: Cross-Domain Recommendation (CDR) seeks to enable effective knowledge transfer across domains. Existing works rely on either representation alignment or transformation bridges, but they struggle on identifying domain-shared from domain-specific latent factors. Specifically, while CDR describes user representations as a joint distribution over two domains, these methods fail to account for its join… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: accepted to SIGIR 2024 as a Full Research Paper

  11. arXiv:2403.19655  [pdf, other

    cs.CV

    GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling

    Authors: Bowen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, Baining Guo

    Abstract: We introduce a radiance representation that is both structured and fully explicit and thus greatly facilitates 3D generative modeling. Existing radiance representations either require an implicit feature decoder, which significantly degrades the modeling power of the representation, or are spatially unstructured, making them difficult to integrate with mainstream 3D diffusion methods. We derive Ga… ▽ More

    Submitted 23 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: Update for digital avatar creation and text-to-3D synthesis; Project Page: https://gaussiancube.github.io/

  12. arXiv:2403.14623  [pdf, other

    cs.LG cs.CV

    Simplified Diffusion Schrödinger Bridge

    Authors: Zhicong Tang, Tiankai Hang, Shuyang Gu, Dong Chen, Baining Guo

    Abstract: This paper introduces a novel theoretical simplification of the Diffusion Schrödinger Bridge (DSB) that facilitates its unification with Score-based Generative Models (SGMs), addressing the limitations of DSB in complex data generation and enabling faster convergence and enhanced performance. By employing SGMs as an initial solution for DSB, our approach capitalizes on the strengths of both framew… ▽ More

    Submitted 27 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  13. arXiv:2403.12806  [pdf, other

    cs.CV

    VisualCritic: Making LMMs Perceive Visual Quality Like Humans

    Authors: Zhipeng Huang, Zhizheng Zhang, Yiting Lu, Zheng-Jun Zha, Zhibo Chen, Baining Guo

    Abstract: At present, large multimodal models (LMMs) have exhibited impressive generalization capabilities in understanding and generating visual signals. However, they currently still lack sufficient capability to perceive low-level visual quality akin to human perception. Can LMMs achieve this and show the same degree of generalization in this regard? If so, not only could the versatility of LMMs be furth… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  14. arXiv:2403.12801  [pdf, other

    cs.CV

    RelationVLM: Making Large Vision-Language Models Understand Visual Relations

    Authors: Zhipeng Huang, Zhizheng Zhang, Zheng-Jun Zha, Yan Lu, Baining Guo

    Abstract: The development of Large Vision-Language Models (LVLMs) is striving to catch up with the success of Large Language Models (LLMs), yet it faces more challenges to be resolved. Very recent works enable LVLMs to localize object-level visual contents and ground text to them. Nonetheless, current LVLMs still struggle to precisely understand visual relations due to the lack of relevant data. In this wor… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  15. arXiv:2402.04631  [pdf, other

    cs.CL

    The Future of Cognitive Strategy-enhanced Persuasive Dialogue Agents: New Perspectives and Trends

    Authors: Mengqi Chen, Bin Guo, Hao Wang, Haoyu Li, Qian Zhao, **gqi Liu, Yasan Ding, Yan Pan, Zhiwen Yu

    Abstract: Persuasion, as one of the crucial abilities in human communication, has garnered extensive attention from researchers within the field of intelligent dialogue systems. We humans tend to persuade others to change their viewpoints, attitudes or behaviors through conversations in various scenarios (e.g., persuasion for social good, arguing in online platforms). Develo** dialogue agents that can per… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 36 pages, 6 figures

  16. EchoPFL: Asynchronous Personalized Federated Learning on Mobile Devices with On-Demand Staleness Control

    Authors: Xiaochen Li, Sicong Liu, Zimu Zhou, Bin Guo, Yuan Xu, Zhiwen Yu

    Abstract: The rise of mobile devices with abundant sensory data and local computing capabilities has driven the trend of federated learning (FL) on these devices. And personalized FL (PFL) emerges to train specific deep models for each mobile device to address data heterogeneity and varying performance preferences. However, mobile training times vary significantly, resulting in either delay (when waiting fo… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: accepted by Ubicomp2024

  17. arXiv:2401.14915  [pdf, other

    cs.HC cs.AI

    Charting the Future of AI in Project-Based Learning: A Co-Design Exploration with Students

    Authors: Chengbo Zheng, Kangyu Yuan, Bingcan Guo, Reza Hadi Mogavi, Zhenhui Peng, Shuai Ma, Xiaojuan Ma

    Abstract: The increasing use of Artificial Intelligence (AI) by students in learning presents new challenges for assessing their learning outcomes in project-based learning (PBL). This paper introduces a co-design study to explore the potential of students' AI usage data as a novel material for PBL assessment. We conducted workshops with 18 college students, encouraging them to speculate an alternative worl… ▽ More

    Submitted 29 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Conditionally accepted by CHI '24

  18. arXiv:2401.13011  [pdf, other

    cs.CV

    CCA: Collaborative Competitive Agents for Image Editing

    Authors: Tiankai Hang, Shuyang Gu, Dong Chen, Xin Geng, Baining Guo

    Abstract: This paper presents a novel generative model, Collaborative Competitive Agents (CCA), which leverages the capabilities of multiple Large Language Models (LLMs) based agents to execute complex tasks. Drawing inspiration from Generative Adversarial Networks (GANs), the CCA system employs two equal-status generator agents and a discriminator agent. The generators independently process user instructio… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  19. arXiv:2401.12974  [pdf, other

    eess.IV cs.CV q-bio.QM

    SegmentAnyBone: A Universal Model that Segments Any Bone at Any Location on MRI

    Authors: Hanxue Gu, Roy Colglazier, Haoyu Dong, Jikai Zhang, Yaqian Chen, Zafer Yildiz, Yuwen Chen, Lin Li, Jichen Yang, Jay Willhite, Alex M. Meyer, Brian Guo, Yashvi Atul Shah, Emily Luo, Shipra Rajput, Sally Kuehn, Clark Bulleit, Kevin A. Wu, Jisoo Lee, Brandon Ramirez, Darui Lu, Jay M. Levin, Maciej A. Mazurowski

    Abstract: Magnetic Resonance Imaging (MRI) is pivotal in radiology, offering non-invasive and high-quality insights into the human body. Precise segmentation of MRIs into different organs and tissues would be highly beneficial since it would allow for a higher level of understanding of the image content and enable important measurements, which are essential for accurate diagnosis and effective treatment pla… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 15 pages, 15 figures

  20. arXiv:2312.15821  [pdf, other

    cs.SD cs.LG eess.AS

    Audiobox: Unified Audio Generation with Natural Language Prompts

    Authors: Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu

    Abstract: Audio is an essential part of our life, but creating it often requires expertise and is time-consuming. Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data. However, these models lack controllability in sever… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  21. arXiv:2312.11459  [pdf, other

    cs.CV

    VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

    Authors: Zhicong Tang, Shuyang Gu, Chunyu Wang, Ting Zhang, Jianmin Bao, Dong Chen, Baining Guo

    Abstract: This paper introduces a pioneering 3D volumetric encoder designed for text-to-3D generation. To scale up the training data for the diffusion model, a lightweight network is developed to efficiently acquire feature volumes from multi-view images. The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net. This research further addresses the challenges of inaccur… ▽ More

    Submitted 28 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  22. Implementing Digital Twin in Field-Deployed Optical Networks: Uncertain Factors, Operational Guidance, and Field-Trial Demonstration

    Authors: Yuchen Song, Min Zhang, Yao Zhang, Yan Shi, Shikui Shen, Bingli Guo, Shanguo Huang, Danshi Wang

    Abstract: Digital twin has revolutionized optical communication networks by enabling their full life-cycle management, including design, troubleshooting, optimization, upgrade, and prediction. While extensive literature exists on frameworks, standards, and applications of digital twin, there is a pressing need in implementing digital twin in field-deployed optical networks operating in real-world environmen… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 10 pages, 5 figures Accepted by IEEE Network Magazine, early access

  23. arXiv:2311.18829  [pdf, other

    cs.CV

    MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

    Authors: Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, **gxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo

    Abstract: We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation. Unlike existing approaches that align text prompts with video directly, MicroCinema introduces a Divide-and-Conquer strategy which divides the text-to-video into a two-stage process: text-to-image generation and image\&text-to-video generation. This strategy offers two signific… ▽ More

    Submitted 29 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Project page: https://wangyanhui666.github.io/MicroCinema.github.io/

  24. arXiv:2311.16974  [pdf, other

    cs.CV

    COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design

    Authors: Peidong Jia, Chenxuan Li, Yuhui Yuan, Zeyu Liu, Yichao Shen, Bohan Chen, Xingru Chen, Yinglin Zheng, Dong Chen, Ji Li, Xiaodong Xie, Shanghang Zhang, Baining Guo

    Abstract: Graphic design, which has been evolving since the 15th century, plays a crucial role in advertising. The creation of high-quality designs demands design-oriented planning, reasoning, and layer-wise generation. Unlike the recent CanvaGPT, which integrates GPT-4 with existing design templates to build a custom GPT, this paper introduces the COLE system - a hierarchical generation framework designed… ▽ More

    Submitted 18 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Technical report. Project page: https://graphic-design-generation-github-io.vercel.app/

  25. arXiv:2311.12062  [pdf, other

    cs.CV cs.AI

    PBWR: Parametric Building Wireframe Reconstruction from Aerial LiDAR Point Clouds

    Authors: Shangfeng Huang, Ruisheng Wang, Bo Guo, Hongxin Yang

    Abstract: In this paper, we present an end-to-end 3D building wireframe reconstruction method to regress edges directly from aerial LiDAR point clouds.Our method, named Parametric Building Wireframe Reconstruction (PBWR), takes aerial LiDAR point clouds and initial edge entities as input, and fully uses self-attention mechanism of transformers to regress edge parameters without any intermediate steps such a… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  26. arXiv:2311.03756  [pdf, other

    cs.LG cs.AI eess.SY

    Learning Decentralized Traffic Signal Controllers with Multi-Agent Graph Reinforcement Learning

    Authors: Yao Zhang, Zhiwen Yu, Jun Zhang, Liang Wang, Tom H. Luan, Bin Guo, Chau Yuen

    Abstract: This paper considers optimal traffic signal control in smart cities, which has been taken as a complex networked system control problem. Given the interacting dynamics among traffic lights and road networks, attaining controller adaptivity and scalability stands out as a primary challenge. Capturing the spatial-temporal correlation among traffic lights under the framework of Multi-Agent Reinforcem… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  27. arXiv:2310.16547  [pdf, other

    cs.DC

    AdaMEC: Towards a Context-Adaptive and Dynamically-Combinable DNN Deployment Framework for Mobile Edge Computing

    Authors: Bowen Pang, Sicong Liu, Hongli Wang, Bin Guo, Yuzhan Wang, Hao Wang, Zhenli Sheng, Zhongyi Wang, Zhiwen Yu

    Abstract: With the rapid development of deep learning, recent research on intelligent and interactive mobile applications (e.g., health monitoring, speech recognition) has attracted extensive attention. And these applications necessitate the mobile edge computing scheme, i.e., offloading partial computation from mobile devices to edge devices for inference acceleration and transmission load reduction. The c… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  28. arXiv:2310.07268  [pdf, other

    cs.LG

    RaftFed: A Lightweight Federated Learning Framework for Vehicular Crowd Intelligence

    Authors: Changan Yang, Yaxing Chen, Yao Zhang, Helei Cui, Zhiwen Yu, Bin Guo, Zheng Yan, Zijiang Yang

    Abstract: Vehicular crowd intelligence (VCI) is an emerging research field. Facilitated by state-of-the-art vehicular ad-hoc networks and artificial intelligence, various VCI applications come to place, e.g., collaborative sensing, positioning, and map**. The collaborative property of VCI applications generally requires data to be shared among participants, thus forming network-wide intelligence. How to f… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 8 pages,8 figures

  29. arXiv:2309.16496  [pdf, other

    cs.CV

    CCEdit: Creative and Controllable Video Editing via Diffusion Models

    Authors: Ruoyu Feng, Wenming Weng, Yanhui Wang, Yuhui Yuan, Jianmin Bao, Chong Luo, Zhibo Chen, Baining Guo

    Abstract: In this paper, we present CCEdit, a versatile generative video editing framework based on diffusion models. Our approach employs a novel trident network structure that separates structure and appearance control, ensuring precise and creative editing capabilities. Utilizing the foundational ControlNet architecture, we maintain the structural integrity of the video during editing. The incorporation… ▽ More

    Submitted 6 April, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

  30. arXiv:2309.15500  [pdf, other

    cs.DC

    AdaEvo: Edge-Assisted Continuous and Timely DNN Model Evolution for Mobile Devices

    Authors: Lehao Wang, Zhiwen Yu, Haoyi Yu, Sicong Liu, Yaxiong Xie, Bin Guo, Yunxin Liu

    Abstract: Mobile video applications today have attracted significant attention. Deep learning model (e.g. deep neural network, DNN) compression is widely used to enable on-device inference for facilitating robust and private mobile video applications. The compressed DNN, however, is vulnerable to the agnostic data drift of the live video captured from the dynamically changing mobile scenarios. To combat the… ▽ More

    Submitted 25 October, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted by IEEE Transactions on Mobile Computing 2023

  31. arXiv:2309.15467  [pdf, other

    cs.LG cs.AI cs.DC

    Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey

    Authors: Sicong Liu, Bin Guo, Cheng Fang, Ziqi Wang, Shiyan Luo, Zimu Zhou, Zhiwen Yu

    Abstract: The emerging field of artificial intelligence of things (AIoT, AI+IoT) is driven by the widespread use of intelligent infrastructures and the impressive success of deep learning (DL). With the deployment of DL on various intelligent infrastructures featuring rich sensors and weak DL computing capabilities, a diverse range of AIoT applications has become possible. However, DL models are notoriously… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  32. arXiv:2309.05361  [pdf

    physics.plasm-ph cs.AI cs.LG

    Cross-tokamak Disruption Prediction based on Physics-Guided Feature Extraction and domain adaptation

    Authors: Chengshuo Shen, Wei Zheng, Bihao Guo, Yonghua Ding, Dalong Chen, Xinkun Ai, Fengming Xue, Yu Zhong, Nengchao Wang, Biao Shen, Binjia Xiao, Zhongyong Chen, Yuan Pan, J-TEXT team

    Abstract: The high acquisition cost and the significant demand for disruptive discharges for data-driven disruption prediction models in future tokamaks pose an inherent contradiction in disruption prediction research. In this paper, we demonstrated a novel approach to predict disruption in a future tokamak using only a few discharges. The first step is to use the existing understanding of physics to extrac… ▽ More

    Submitted 1 November, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: 17 pages, 9 figures

  33. arXiv:2309.03895  [pdf, other

    cs.CV

    InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

    Authors: Zigang Geng, Binxin Yang, Tiankai Hang, Chen Li, Shuyang Gu, Ting Zhang, Jianmin Bao, Zheng Zhang, Han Hu, Dong Chen, Baining Guo

    Abstract: We present InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions. Unlike existing approaches that integrate prior knowledge and pre-define the output space (e.g., categories and coordinates) for each vision task, we cast diverse vision tasks into a human-intuitive image-manipulating process whose output space is a flexible and interactive pi… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  34. arXiv:2309.01343  [pdf, other

    cs.IR

    Distributional Domain-Invariant Preference Matching for Cross-Domain Recommendation

    Authors: **g Du, Zesheng Ye, Bin Guo, Zhiwen Yu, Lina Yao

    Abstract: Learning accurate cross-domain preference map**s in the absence of overlapped users/items has presented a persistent challenge in Non-overlap** Cross-domain Recommendation (NOCDR). Despite the efforts made in previous studies to address NOCDR, several limitations still exist. Specifically, 1) while some approaches substitute overlap** users/items with overlap** behaviors, they cannot handl… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 9 pages, 5 figures, full research paper accepted by ICDM 2023

  35. arXiv:2308.11088  [pdf, other

    cs.AI cs.MA

    Collaborative Route Planning of UAVs, Workers and Cars for Crowdsensing in Disaster Response

    Authors: Lei Han, Chunyu Tu, Zhiwen Yu, Zhiyong Yu, Weihua Shan, Liang Wang, Bin Guo

    Abstract: Efficiently obtaining the up-to-date information in the disaster-stricken area is the key to successful disaster response. Unmanned aerial vehicles (UAVs), workers and cars can collaborate to accomplish sensing tasks, such as data collection, in disaster-stricken areas. In this paper, we explicitly address the route planning for a group of agents, including UAVs, workers, and cars, with the goal o… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  36. arXiv:2308.04409  [pdf, other

    cs.CV

    V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection

    Authors: Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang, Han Hu, Nanning Zheng, Baining Guo

    Abstract: We introduce a highly performant 3D object detector for point clouds using the DETR framework. The prior attempts all end up with suboptimal results because they fail to learn accurate inductive biases from the limited scale of training data. In particular, the queries often attend to points that are far away from the target objects, violating the locality principle in object detection. To address… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  37. arXiv:2307.15490  [pdf, other

    cs.DC cs.NI cs.SI

    Unleashing the Potential of Stage-Wise Decision-Making in Scheduling of Graph-Structured Tasks over Mobile Vehicular Clouds

    Authors: Minghui Liwang, Bingshuo Guo, Zhanxi Ma, Yuhan Su, Jian **, Seyyedali Hosseinalipour, Xianbin Wang, Huaiyu Dai

    Abstract: To effectively process high volume of data across a fleet of dynamic and distributed vehicles, it is crucial to implement resource provisioning techniques that can provide reliable, cost-effective, and timely computing services. This article explores computation-intensive task scheduling over mobile vehicular clouds (MVCs). We use undirected weighted graphs (UWGs) to model both the execution of ta… ▽ More

    Submitted 20 December, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

  38. arXiv:2307.14008  [pdf, other

    cs.CV

    Adaptive Frequency Filters As Efficient Global Token Mixers

    Authors: Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Zheng-Jun Zha, Yan Lu, Baining Guo

    Abstract: Recent vision transformers, large-kernel CNNs and MLPs have attained remarkable successes in broad vision tasks thanks to their effective information fusion in the global scope. However, their efficient deployments, especially on mobile devices, still suffer from noteworthy challenges due to the heavy computational costs of self-attention mechanisms, large kernels, or fully connected layers. In th… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV2023

  39. arXiv:2307.12304  [pdf

    cs.LG cs.CE

    Physics-Informed Machine Learning of Argon Gas-Driven Melt Pool Dynamics

    Authors: R. Sharma, W. Grace Guo, M. Raissi, Y. B. Guo

    Abstract: Melt pool dynamics in metal additive manufacturing (AM) is critical to process stability, microstructure formation, and final properties of the printed materials. Physics-based simulation including computational fluid dynamics (CFD) is the dominant approach to predict melt pool dynamics. However, the physics-based simulation approaches suffer from the inherent issue of very high computational cost… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

  40. arXiv:2307.10168  [pdf, other

    cs.CL cs.HC

    LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs

    Authors: Tongshuang Wu, Haiyi Zhu, Maya Albayrak, Alexis Axon, Amanda Bertsch, Wenxing Deng, Ziqi Ding, Bill Guo, Sireesh Gururaja, Tzu-Sheng Kuo, Jenny T. Liang, Ryan Liu, Ihita Mandal, Jeremiah Milbauer, Xiaolin Ni, Namrata Padmanabhan, Subhashini Ramkumar, Alexis Sudjianto, Jordan Taylor, Ying-Jui Tseng, Patricia Vaidos, Zhi** Wu, Wei Wu, Chenyang Yang

    Abstract: LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these "human computation algorithms," but… ▽ More

    Submitted 19 July, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

  41. arXiv:2307.04520  [pdf, other

    cs.CV

    Efficient Match Pair Retrieval for Large-scale UAV Images via Graph Indexed Global Descriptor

    Authors: San Jiang, Yichen Ma, Qingquan Li, Wanshou Jiang, Bingxuan Guo, Lelin Li, Lizhe Wang

    Abstract: SfM (Structure from Motion) has been extensively used for UAV (Unmanned Aerial Vehicle) image orientation. Its efficiency is directly influenced by feature matching. Although image retrieval has been extensively used for match pair selection, high computational costs are consumed due to a large number of local features and the large size of the used codebook. Thus, this paper proposes an efficient… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  42. arXiv:2307.04449  [pdf, other

    physics.comp-ph cs.LG

    Graph Convolutional Networks for Simulating Multi-phase Flow and Transport in Porous Media

    Authors: Jiamin Jiang, Bo Guo

    Abstract: Numerical simulation of multi-phase fluid dynamics in porous media is critical for many energy and environmental applications in Earth's subsurface. Data-driven surrogate modeling provides computationally inexpensive alternatives to high-fidelity numerical simulators. While the commonly used convolutional neural networks (CNNs) are powerful in approximating partial differential equation solutions,… ▽ More

    Submitted 15 April, 2024; v1 submitted 10 July, 2023; originally announced July 2023.

  43. arXiv:2306.08126  [pdf, other

    cs.CL cs.AI

    PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer

    Authors: Xu Han, Bin Guo, Yoon Jung, Benjamin Yao, Yu Zhang, Xiaohu Liu, Chenlei Guo

    Abstract: Personalized dialogue agents (DAs) powered by large pre-trained language models (PLMs) often rely on explicit persona descriptions to maintain personality consistency. However, such descriptions may not always be available or may pose privacy concerns. To tackle this bottleneck, we introduce PersonaPKT, a lightweight transfer learning approach that can build persona-consistent dialogue models with… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: 10 pages, 3 figures, accepted to SustaiNLP 2023

  44. arXiv:2305.07254  [pdf, other

    cs.CR

    A Lightweight Authentication Protocol against Modeling Attacks based on a Novel LFSR-APUF

    Authors: Yao Wang, Xue Mei, Zhengtai Chang, Wenbing Fan, Benqing Guo, Zhi Quan

    Abstract: Simple authentication protocols based on conventional physical unclonable function (PUF) are vulnerable to modeling attacks and other security threats. This paper proposes an arbiter PUF based on a linear feedback shift register (LFSR-APUF). Different from the previously reported linear feedback shift register for challenge extension, the proposed scheme feeds the external random challenges into t… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  45. arXiv:2305.01968  [pdf

    eess.IV cs.CV cs.LG

    DPSeq: A Novel and Efficient Digital Pathology Classifier for Predicting Cancer Biomarkers using Sequencer Architecture

    Authors: Min Cen, Xingyu Li, Bangwei Guo, Jitendra Jonnagaddala, Hong Zhang, Xu Steven Xu

    Abstract: In digital pathology tasks, transformers have achieved state-of-the-art results, surpassing convolutional neural networks (CNNs). However, transformers are usually complex and resource intensive. In this study, we developed a novel and efficient digital pathology classifier called DPSeq, to predict cancer biomarkers through fine-tuning a sequencer architecture integrating horizon and vertical bidi… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

  46. arXiv:2304.06906  [pdf, other

    cs.CV

    Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding

    Authors: Yu-Qi Yang, Yu-Xiao Guo, Jian-Yu Xiong, Yang Liu, Hao Pan, Peng-Shuai Wang, Xin Tong, Baining Guo

    Abstract: The use of pretrained backbones with fine-tuning has been successful for 2D vision and natural language processing tasks, showing advantages over task-specific networks. In this work, we introduce a pretrained 3D backbone, called {\SST}, for 3D indoor scene understanding. We design a 3D Swin transformer as our backbone network, which enables efficient self-attention on sparse voxels with linear me… ▽ More

    Submitted 15 August, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: Project page: https://yukichiii.github.io/project/swin3D/swin3D.html

  47. arXiv:2303.14965  [pdf

    physics.plasm-ph cs.AI cs.LG

    Disruption Precursor Onset Time Study Based on Semi-supervised Anomaly Detection

    Authors: Xinkun Ai, Wei Zheng, Ming Zhang, Dalong Chen, Chengshuo Shen, Bihao Guo, Bingjia Xiao, Yu Zhong, Nengchao Wang, Zhoujun Yang, Zhipeng Chen, Zhongyong Chen, Yonghua Ding, Yuan Pan, J-TEXT team

    Abstract: The full understanding of plasma disruption in tokamaks is currently lacking, and data-driven methods are extensively used for disruption prediction. However, most existing data-driven disruption predictors employ supervised learning techniques, which require labeled training data. The manual labeling of disruption precursors is a tedious and challenging task, as some precursors are difficult to a… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: 21 pages, 11 figures

    Report number: S1738-5733(23)00556-9

    Journal ref: Nuclear Engineering and Technology 2023

  48. arXiv:2303.13091  [pdf, other

    cs.IR cs.IT

    Limits of Predictability in Top-N Recommendation

    Authors: En Xu, Zhiwen Yu, Ying Zhang, Bin Guo, Lina Yao

    Abstract: Top-N recommendation aims to recommend each consumer a small set of N items from a large collection of items, and its accuracy is one of the most common indexes to evaluate the performance of a recommendation system. While a large number of algorithms are proposed to push the Top-N accuracy by learning the user preference from their history purchase data, a predictability question is naturally rai… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

  49. arXiv:2303.10126  [pdf, other

    cs.CV

    IRGen: Generative Modeling for Image Retrieval

    Authors: Yidan Zhang, Ting Zhang, Dong Chen, Yu**g Wang, Qi Chen, Xing Xie, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Mao Yang, Qingmin Liao, Baining Guo

    Abstract: While generative modeling has been ubiquitous in natural language processing and computer vision, its application to image retrieval remains unexplored. In this paper, we recast image retrieval as a form of generative modeling by employing a sequence-to-sequence model, contributing to the current unified theme. Our framework, IRGen, is a unified model that enables end-to-end differentiable search,… ▽ More

    Submitted 28 June, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

  50. arXiv:2303.09556  [pdf, other

    cs.CV

    Efficient Diffusion Training via Min-SNR Weighting Strategy

    Authors: Tiankai Hang, Shuyang Gu, Chen Li, Jianmin Bao, Dong Chen, Han Hu, Xin Geng, Baining Guo

    Abstract: Denoising diffusion models have been a mainstream approach for image generation, however, training these models often suffers from slow convergence. In this paper, we discovered that the slow convergence is partly due to conflicting optimization directions between timesteps. To address this issue, we treat the diffusion training as a multi-task learning problem, and introduce a simple yet effectiv… ▽ More

    Submitted 11 March, 2024; v1 submitted 16 March, 2023; originally announced March 2023.