Skip to main content

Showing 1–50 of 590 results for author: Huang, Q

Searching in archive cs. Search in all archives.
.
  1. Sequential Manipulation Against Rank Aggregation: Theory and Algorithm

    Authors: Ke Ma, Qianqian Xu, **shan Zeng, Wei Liu, Xiaochun Cao, Yingfei Sun, Qingming Huang

    Abstract: Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc . Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fu… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE TPAMI URL: https://ieeexplore.ieee.org/document/10564181

  2. Multi-agent Cooperative Games Using Belief Map Assisted Training

    Authors: Qinwei Huang, Chen Luo, Alex B. Wu, Simon Khan, Hai Li, Qinru Qiu

    Abstract: In a multi-agent system, agents share their local observations to gain global situational awareness for decision making and collaboration using a message passing system. When to send a message, how to encode a message, and how to leverage the received messages directly affect the effectiveness of the collaboration among agents. When training a multi-agent cooperative game using reinforcement learn… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Journal ref: ECAI 2023. IOS Press, 2023: 1617-1624

  3. Learning Retrieval Augmentation for Personalized Dialogue Generation

    Authors: Qiushi Huang, Shuai Fu, Xubo Liu, Wenwu Wang, Tom Ko, Yu Zhang, Lilian Tang

    Abstract: Personalized dialogue generation, focusing on generating highly tailored responses by leveraging persona profiles and dialogue context, has gained significant attention in conversational AI applications. However, persona profiles, a prevalent setting in current personalized dialogue datasets, typically composed of merely four to five sentences, may not offer comprehensive descriptions of the perso… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to EMNLP-2023

  4. arXiv:2406.18187  [pdf, other

    cs.CL cs.AI cs.LG

    Selective Prompting Tuning for Personalized Conversations with LLMs

    Authors: Qiushi Huang, Xubo Liu, Tom Ko, Bo Wu, Wenwu Wang, Yu Zhang, Lilian Tang

    Abstract: In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models' (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we first study two common approaches for personalizing LLMs: textual prompting and direct fine-tuning. We observed that textual prompting often struggles to… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 findings

  5. arXiv:2406.16986  [pdf, ps, other

    cs.LG cs.AI cs.CR

    Machine Unlearning with Minimal Gradient Dependence for High Unlearning Ratios

    Authors: Tao Huang, Ziyang Chen, Jiayang Meng, Qingyu Huang, Xu Yang, Xun Yi, Ibrahim Khalil

    Abstract: In the context of machine unlearning, the primary challenge lies in effectively removing traces of private data from trained models while maintaining model performance and security against privacy attacks like membership inference attacks. Traditional gradient-based unlearning methods often rely on extensive historical gradients, which becomes impractical with high unlearning ratios and may reduce… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  6. arXiv:2406.12121  [pdf, other

    cs.CV

    TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations

    Authors: Bo Sun, Thibault Groueix, Chen Song, Qixing Huang, Noam Aigerman

    Abstract: This work proposes a novel representation of injective deformations of 3D space, which overcomes existing limitations of injective methods: inaccuracy, lack of robustness, and incompatibility with general learning and optimization frameworks. The core idea is to reduce the problem to a deep composition of multiple 2D mesh-based piecewise-linear maps. Namely, we build differentiable layers that pro… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  7. arXiv:2406.11238  [pdf, other

    cs.CL

    What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling

    Authors: Yutong Hu, Quzhe Huang, Kangcheng Luo, Yansong Feng

    Abstract: As the context length that large language models can handle continues to increase, these models demonstrate an enhanced ability to utilize distant information for tasks such as language modeling. This capability contrasts with human reading and writing habits, where it is uncommon to remember and use particularly distant information, except in cases of foreshadowing. In this paper, we aim to explo… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.11200  [pdf, other

    cs.LG cs.CL

    AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval

    Authors: Shirley Wu, Shiyu Zhao, Qian Huang, Kexin Huang, Michihiro Yasunaga, Kaidi Cao, Vassilis N. Ioannidis, Karthik Subbian, Jure Leskovec, James Zou

    Abstract: Large language model (LLM) agents have demonstrated impressive capability in utilizing external tools and knowledge to boost accuracy and reduce hallucinations. However, develo** the prompting techniques that make LLM agents able to effectively use external tools and knowledge is a heuristic and laborious task. Here, we introduce AvaTaR, a novel and automatic framework that optimizes an LLM agen… ▽ More

    Submitted 17 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 19 pages, 8 figures, 6 tables

  9. arXiv:2406.10167  [pdf, other

    cs.CV

    4DRecons: 4D Neural Implicit Deformable Objects Reconstruction from a single RGB-D Camera with Geometrical and Topological Regularizations

    Authors: Xiaoyan Cong, Haitao Yang, Liyan Chen, Kaifeng Zhang, Li Yi, Chandrajit Bajaj, Qixing Huang

    Abstract: This paper presents a novel approach 4DRecons that takes a single camera RGB-D sequence of a dynamic subject as input and outputs a complete textured deforming 3D model over time. 4DRecons encodes the output as a 4D neural implicit surface and presents an optimization procedure that combines a data term and two regularization terms. The data term fits the 4D implicit surface to the input partial o… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  10. arXiv:2406.08479  [pdf, other

    cs.CV

    Real3D: Scaling Up Large Reconstruction Models with Real-World Images

    Authors: Hanwen Jiang, Qixing Huang, Georgios Pavlakos

    Abstract: The default strategy for training single-view Large Reconstruction Models (LRMs) follows the fully supervised route using large-scale datasets of synthetic 3D assets or multi-view captures. Although these resources simplify the training procedure, they are hard to scale up beyond the existing datasets and they are not necessarily representative of the real distribution of object shapes. To address… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Project page: https://hwjiang1510.github.io/Real3D/

  11. arXiv:2406.07209  [pdf, other

    cs.CV

    MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

    Authors: X. Wang, Siming Fu, Qihan Huang, Wanggui He, Hao Jiang

    Abstract: Recent advancements in text-to-image generation models have dramatically enhanced the generation of photorealistic images from textual prompts, leading to an increased interest in personalized text-to-image applications, particularly in multi-subject scenarios. However, these advances are hindered by two main challenges: firstly, the need to accurately maintain the details of each referenced subje… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  12. arXiv:2406.04983  [pdf, other

    cs.CV

    CityCraft: A Real Crafter for 3D City Generation

    Authors: Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang

    Abstract: City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neur… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 20 pages, 9 figures

  13. arXiv:2406.03417  [pdf, other

    cs.CV cs.GR

    CoFie: Learning Compact Neural Surface Representations with Coordinate Fields

    Authors: Hanwen Jiang, Haitao Yang, Georgios Pavlakos, Qixing Huang

    Abstract: This paper introduces CoFie, a novel local geometry-aware neural surface representation. CoFie is motivated by the theoretical analysis of local SDFs with quadratic approximation. We find that local shapes are highly compressive in an aligned coordinate frame defined by the normal and tangent directions of local shapes. Accordingly, we introduce Coordinate Field, which is a composition of coordina… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Project page: https://hwjiang1510.github.io/CoFie/

  14. arXiv:2406.02580  [pdf, other

    cs.NE cs.LG

    Exploiting Chaotic Dynamics as Deep Neural Networks

    Authors: Shuhong Liu, Nozomi Akashi, Qingyao Huang, Yasuo Kuniyoshi, Kohei Nakajima

    Abstract: Chaos presents complex dynamics arising from nonlinearity and a sensitivity to initial states. These characteristics suggest a depth of expressivity that underscores their potential for advanced computational applications. However, strategies to effectively exploit chaotic dynamics for information processing have largely remained elusive. In this study, we reveal that the essence of chaos can be f… ▽ More

    Submitted 29 May, 2024; originally announced June 2024.

  15. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  16. arXiv:2405.20810  [pdf, other

    cs.CV

    Context-aware Difference Distilling for Multi-change Captioning

    Authors: Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, Chenggang Yan, Qingming Huang

    Abstract: Multi-change captioning aims to describe complex and coupled changes within an image pair in natural language. Compared with single-change captioning, this task requires the model to have higher-level cognition ability to reason an arbitrary number of changes. In this paper, we propose a novel context-aware difference distilling (CARD) network to capture all genuine changes for yielding sentences.… ▽ More

    Submitted 7 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL 2024 main conference (long paper)

  17. arXiv:2405.20624  [pdf, ps, other

    cs.CL cs.AI

    Leveraging Large Language Models for Entity Matching

    Authors: Qianyu Huang, Tongfang Zhao

    Abstract: Entity matching (EM) is a critical task in data integration, aiming to identify records across different datasets that refer to the same real-world entities. Traditional methods often rely on manually engineered features and rule-based systems, which struggle with diverse and unstructured data. The emergence of Large Language Models (LLMs) such as GPT-4 offers transformative potential for EM, leve… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  18. arXiv:2405.17631  [pdf, other

    cs.AI cs.CE cs.MA

    BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments

    Authors: Yusuf Roohani, Jian Vora, Qian Huang, Zachary Steinhart, Alexander Marson, Percy Liang, Jure Leskovec

    Abstract: Agents based on large language models have shown great potential in accelerating scientific discovery by leveraging their rich background knowledge and reasoning capabilities. Here, we develop BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions. We demonstrate our agent on the problem of d… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  19. arXiv:2405.16029  [pdf, other

    cs.LG

    Online Resource Allocation for Edge Intelligence with Colocated Model Retraining and Inference

    Authors: Huaiguang Cai, Zhi Zhou, Qianyi Huang

    Abstract: With edge intelligence, AI models are increasingly pushed to the edge to serve ubiquitous users. However, due to the drift of model, data, and task, AI model deployed at the edge suffers from degraded accuracy in the inference serving phase. Model retraining handles such drifts by periodically retraining the model with newly arrived data. When colocating model retraining and model inference servin… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by the IEEE INFOCOM 2024 Main Conference

  20. arXiv:2405.12979  [pdf, other

    cs.CV

    OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

    Authors: Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, Andre Araujo

    Abstract: The image matching field has been witnessing a continuous emergence of novel learnable feature matching techniques, with ever-improving performance on conventional benchmarks. However, our investigation shows that despite these gains, their potential for real-world applications is restricted by their limited generalization capabilities to novel image domains. In this paper, we introduce OmniGlue,… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  21. arXiv:2405.10508  [pdf, other

    cs.CV

    ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation

    Authors: Pengzhi Li, Chengshuai Tang, Qinxuan Huang, Zhiheng Li

    Abstract: In this paper, we explore the existing challenges in 3D artistic scene generation by introducing ART3D, a novel framework that combines diffusion models and 3D Gaussian splatting techniques. Our method effectively bridges the gap between artistic and realistic images through an innovative image semantic transfer algorithm. By leveraging depth information and an initial artistic image, we generate… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted at CVPR 2024 Workshop on AI3DG

  22. arXiv:2405.10497  [pdf, other

    cs.MM cs.AI cs.CV cs.SI

    SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge

    Authors: Bo Wu, Peiye Liu, Wen-Huang Cheng, Bei Liu, Zhaoyang Zeng, Jia Wang, Qiushi Huang, Jiebo Luo

    Abstract: Social Media Popularity Prediction (SMPP) is a crucial task that involves automatically predicting future popularity values of online posts, leveraging vast amounts of multimodal data available on social media platforms. Studying and investigating social media popularity becomes central to various online applications and requires novel methods of comprehensive analysis, multimodal comprehension, a… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: ACM Multimedia. arXiv admin note: text overlap with arXiv:1910.01795

  23. arXiv:2405.09782  [pdf, other

    cs.CV

    Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection

    Authors: Feiran Li, Qianqian Xu, Shilong Bao, Zhiyong Yang, Runmin Cong, Xiaochun Cao, Qingming Huang

    Abstract: This paper explores the size-invariance of evaluation metrics in Salient Object Detection (SOD), especially when multiple targets of diverse sizes co-exist in the same image. We observe that current metrics are size-sensitive, where larger objects are focused, and smaller ones tend to be ignored. We argue that the evaluation should be size-invariant because bias based on size is unjustified withou… ▽ More

    Submitted 27 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by ICML2024

  24. arXiv:2405.09321  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    ReconBoost: Boosting Can Achieve Modality Reconcilement

    Authors: Cong Hua, Qianqian Xu, Shilong Bao, Zhiyong Yang, Qingming Huang

    Abstract: This paper explores a novel multi-modal alternating learning paradigm pursuing a reconciliation between the exploitation of uni-modal features and the exploration of cross-modal interactions. This is motivated by the fact that current paradigms of multi-modal learning tend to explore multi-modal features simultaneously. The resulting gradient prohibits further exploitation of the features in the w… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by ICML2024

  25. arXiv:2405.07780  [pdf, other

    cs.LG cs.AI cs.CV

    Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition

    Authors: Zhiyong Yang, Qianqian Xu, Zitai Wang, Sicong Li, Boyu Han, Shilong Bao, Xiaochun Cao, Qingming Huang

    Abstract: This paper explores test-agnostic long-tail recognition, a challenging long-tail task where the test label distributions are unknown and arbitrarily imbalanced. We argue that the variation in these distributions can be broken down hierarchically into global and local levels. The global ones reflect a broad range of diversity, while the local ones typically arise from milder changes, often focused… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  26. arXiv:2405.07046  [pdf, other

    cs.CV

    Retrieval Enhanced Zero-Shot Video Captioning

    Authors: Yunchuan Ma, Laiyun Qing, Guorong Li, Yuankai Qi, Quan Z. Sheng, Qingming Huang

    Abstract: Despite the significant progress of fully-supervised video captioning, zero-shot methods remain much less explored. In this paper, we propose to take advantage of existing pre-trained large-scale vision and language models to directly generate captions with test time adaptation. Specifically, we bridge video and text using three key models: a general video understanding model XCLIP, a general imag… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  27. arXiv:2405.06105  [pdf, ps, other

    cs.CL

    Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding?

    Authors: Yutong Hu, Quzhe Huang, Mingxu Tao, Chen Zhang, Yansong Feng

    Abstract: Recent studies have shown that Large Language Models (LLMs) have the potential to process extremely long text. Many works only evaluate LLMs' long-text processing ability on the language modeling task, with perplexity (PPL) as the evaluation metric. However, in our study, we find that there is no correlation between PPL and LLMs' long-text understanding ability. Besides, PPL may only reflect the m… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  28. arXiv:2405.00248  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    Who is Authentic Speaker

    Authors: Qiang Huang

    Abstract: Voice conversion (VC) using deep learning technologies can now generate high quality one-to-many voices and thus has been used in some practical application fields, such as entertainment and healthcare. However, voice conversion can pose potential social issues when manipulated voices are employed for deceptive purposes. Moreover, it is a big challenge to find who are real speakers from the conver… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  29. arXiv:2404.18648  [pdf, other

    cs.CV

    Uncertainty-boosted Robust Video Activity Anticipation

    Authors: Zhaobo Qi, Shuhui Wang, Weigang Zhang, Qingming Huang

    Abstract: Video activity anticipation aims to predict what will happen in the future, embracing a broad application prospect ranging from robot vision and autonomous driving. Despite the recent progress, the data uncertainty issue, reflected as the content evolution process and dynamic correlation in event labels, has been somehow ignored. This reduces the model generalization ability and deep understanding… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by T-PAMI

  30. arXiv:2404.18047  [pdf, other

    cs.RO

    LIKO: LiDAR, Inertial, and Kinematic Odometry for Bipedal Robots

    Authors: Qingrui Zhao, Mingyuan Li, Yongliang Shi, Xuechao Chen, Zhangguo Yu, Lianqiang Han, Zhenyuan Fu, **tao Zhang, Chao Li, Yuanxi Zhang, Qiang Huang

    Abstract: High-frequency and accurate state estimation is crucial for biped robots. This paper presents a tightly-coupled LiDAR-Inertial-Kinematic Odometry (LIKO) for biped robot state estimation based on an iterated extended Kalman filter. Beyond state estimation, the foot contact position is also modeled and estimated. This allows for both position and velocity updates from kinematic measurement. Addition… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  31. arXiv:2404.13207  [pdf, other

    cs.IR cs.LG

    STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

    Authors: Shirley Wu, Shiyu Zhao, Michihiro Yasunaga, Kexin Huang, Kaidi Cao, Qian Huang, Vassilis N. Ioannidis, Karthik Subbian, James Zou, Jure Leskovec

    Abstract: Answering real-world complex queries, such as complex product search, often requires accurate retrieval from semi-structured knowledge bases that involve blend of unstructured (e.g., textual descriptions of products) and structured (e.g., entity relations of products) information. However, previous works have mostly studied textual and relational retrieval tasks as separate topics. To address the… ▽ More

    Submitted 20 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: 26 pages, 6 figures

  32. arXiv:2404.11844  [pdf, ps, other

    cs.CY

    Finding A Taxi with Illegal Driver Substitution Activity via Behavior Modelings

    Authors: Junbiao Pang, Muhammad Ayub Sabir, Zhuyun Wang, An**g Hu, Xue Yang, Haitao Yu, Qingming Huang

    Abstract: In our urban life, Illegal Driver Substitution (IDS) activity for a taxi is a grave unlawful activity in the taxi industry, possibly causing severe traffic accidents and painful social repercussions. Currently, the IDS activity is manually supervised by law enforcers, i.e., law enforcers empirically choose a taxi and inspect it. The pressing problem of this scheme is the dilemma between the limite… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  33. arXiv:2404.04927  [pdf, ps, other

    cs.IT

    Holographic Integrated Data and Energy Transfer

    Authors: Qingxiao Huang, Jie Hu, Yizhe Zhao, Kun Yang

    Abstract: Thanks to the application of metamaterials, holographic multiple-input multiple-output (H-MIMO) is expected to achieve a higher spatial diversity gain by enabling the ability to generate any current distribution on the surface. With the aid of electromagnetic (EM) manipulation capability of H-MIMO, integrated data and energy transfer (IDET) system can fully exploits the EM channel to realize energ… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  34. arXiv:2404.04120  [pdf, other

    cs.CV

    Cross-Modality Gait Recognition: Bridging LiDAR and Camera Modalities for Human Identification

    Authors: Rui Wang, Chuanfu Shen, Manuel J. Marin-Jimenez, George Q. Huang, Shiqi Yu

    Abstract: Current gait recognition research mainly focuses on identifying pedestrians captured by the same type of sensor, neglecting the fact that individuals may be captured by different sensors in order to adapt to various environments. A more practical approach should involve cross-modality matching across different sensors. Hence, this paper focuses on investigating the problem of cross-modality gait r… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  35. arXiv:2404.02514  [pdf, other

    cs.CV

    Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition

    Authors: Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang

    Abstract: This paper enables high-fidelity, transferable NeRF editing by frequency decomposition. Recent NeRF editing pipelines lift 2D stylization results to 3D scenes while suffering from blurry results, and fail to capture detailed structures caused by the inconsistency between 2D editings. Our critical insight is that low-frequency components of images are more multiview-consistent after editing compare… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  36. arXiv:2404.01862  [pdf, other

    cs.CV cs.HC cs.MM

    Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

    Authors: Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu

    Abstract: Co-speech gestures, if presented in the lively form of videos, can achieve superior visual effects in human-machine interaction. While previous works mostly generate structural human skeletons, resulting in the omission of appearance information, we focus on the direct generation of audio-driven co-speech gesture videos in this work. There are two main challenges: 1) A suitable motion feature is n… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 22 pages, 8 figures, CVPR 2024

  37. arXiv:2403.20045  [pdf

    cs.NI

    Blockchain for Energy Market: A Comprehensive Survey

    Authors: Tianqi Jiang, Haoxiang Luo, Kun Yang, Gang Sun, Hongfang Yu, Qi Huang, Athanasios V. Vasilakos

    Abstract: The energy market encompasses the behavior of energy supply and trading within a platform system. By utilizing centralized or distributed trading, energy can be effectively managed and distributed across different regions, thereby achieving market equilibrium and satisfying both producers and consumers. However, recent years have presented unprecedented challenges and difficulties for the developm… ▽ More

    Submitted 5 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

  38. arXiv:2403.18407  [pdf, other

    cs.CV cs.AI

    A Channel-ensemble Approach: Unbiased and Low-variance Pseudo-labels is Critical for Semi-supervised Classification

    Authors: Jiaqi Wu, Junbiao Pang, Baochang Zhang, Qingming Huang

    Abstract: Semi-supervised learning (SSL) is a practical challenge in computer vision. Pseudo-label (PL) methods, e.g., FixMatch and FreeMatch, obtain the State Of The Art (SOTA) performances in SSL. These approaches employ a threshold-to-pseudo-label (T2L) process to generate PLs by truncating the confidence scores of unlabeled data predicted by the self-training method. However, self-trained models typical… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  39. arXiv:2403.15559  [pdf, other

    cs.CV cs.AI

    An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes Using Pre-Trained Text-to-Image Models

    Authors: Zhengyi Zhao, Chen Song, Xiaodong Gu, Yuan Dong, Qi Zuo, Weihao Yuan, Zilong Dong, Liefeng Bo, Qixing Huang

    Abstract: A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framewor… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  40. arXiv:2403.14349  [pdf, other

    cs.CV

    On the Concept Trustworthiness in Concept Bottleneck Models

    Authors: Qihan Huang, Jie Song, **gwen Hu, Haofei Zhang, Yong Wang, Mingli Song

    Abstract: Concept Bottleneck Models (CBMs), which break down the reasoning process into the input-to-concept map** and the concept-to-label prediction, have garnered significant attention due to their remarkable interpretability achieved by the interpretable concept bottleneck. However, despite the transparency of the concept-to-label prediction, the map** from the input to the intermediate concept rema… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  41. arXiv:2403.12391  [pdf, other

    cs.LG cs.AI

    FairSTG: Countering performance heterogeneity via collaborative sample-level optimization

    Authors: Gengyu Lin, Zhengyang Zhou, Qihe Huang, Kuo Yang, Shifen Cheng, Yang Wang

    Abstract: Spatiotemporal learning plays a crucial role in mobile computing techniques to empower smart cites. While existing research has made great efforts to achieve accurate predictions on the overall dataset, they still neglect the significant performance heterogeneity across samples. In this work, we designate the performance heterogeneity as the reason for unfair spatiotemporal learning, which not onl… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Under review by IEEE Transactions on Mobile Computing

  42. arXiv:2403.12010  [pdf, other

    cs.CV cs.AI cs.GR

    VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model

    Authors: Qi Zuo, Xiaodong Gu, Lingteng Qiu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Rui Peng, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang

    Abstract: Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Project page: aigc3d.github.io/VideoMV/

  43. arXiv:2403.11974  [pdf, other

    eess.IV cs.CV

    OUCopula: Bi-Channel Multi-Label Copula-Enhanced Adapter-Based CNN for Myopia Screening Based on OU-UWF Images

    Authors: Yang Li, Qiuyi Huang, Chong Zhong, Danjuan Yang, Meiyan Li, A. H. Welsh, Aiyi Liu, Bo Fu, Catherien C. Liu, Xingtao Zhou

    Abstract: Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging is potentially significant for ophthalmic outcomes. Current multidisciplinary research between ophthalmology and deep learning (DL) concentrates primarily on disease classification and diagnosis using single-eye images, largely ignoring joint modeling and prediction for Oculus Uterque (OU, both eyes). Inspired by the complex… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  44. arXiv:2403.07652  [pdf, other

    cs.LG cs.CL

    Harder Tasks Need More Experts: Dynamic Routing in MoE Models

    Authors: Quzhe Huang, Zhenwei An, Nan Zhuang, Mingxu Tao, Chen Zhang, Yang **, Kun Xu, Kun Xu, Liwei Chen, Songfang Huang, Yansong Feng

    Abstract: In this paper, we introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models, aiming to enhance computational efficiency and model performance by adjusting the number of activated experts based on input difficulty. Unlike traditional MoE approaches that rely on fixed Top-K routing, which activates a predetermined number of experts regardless of the input's complexity,… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  45. A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes

    Authors: Ting Yu, Xiaojun Lin, Shuhui Wang, Weiguo Sheng, Qingming Huang, Jun Yu

    Abstract: Three-Dimensional (3D) dense captioning is an emerging vision-language bridging task that aims to generate multiple detailed and accurate descriptions for 3D scenes. It presents significant potential and challenges due to its closer representation of the real world compared to 2D visual captioning, as well as complexities in data collection and processing of 3D point cloud sources. Despite the pop… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  46. Spatial features of CO2 for occupancy detection in a naturally ventilated school building

    Authors: Qirui Huang, Marc Syndicus, Jérôme Frisch, Christoph van Treeck

    Abstract: Accurate occupancy information helps to improve building energy efficiency and occupant comfort. Occupancy detection methods based on CO2 sensors have received attention due to their low cost and low intrusiveness. In naturally ventilated buildings, the accuracy of CO2-based occupancy detection is generally low in related studies due to the complex ventilation behavior and the difficulty in measur… ▽ More

    Submitted 28 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Indoor Environments, Volume 1, Issue 3, 2024, 100018, ISSN 2950-3620

    Journal ref: Indoor Environments, Volume 1, Issue 3, 2024, 100018, ISSN 2950-3620

  47. arXiv:2403.06488  [pdf, other

    cs.CV

    Query-guided Prototype Evolution Network for Few-Shot Segmentation

    Authors: Runmin Cong, Hang Xiong, **peng Chen, Wei Zhang, Qingming Huang, Yao Zhao

    Abstract: Previous Few-Shot Segmentation (FSS) approaches exclusively utilize support features for prototype generation, neglecting the specific requirements of the query. To address this, we present the Query-guided Prototype Evolution Network (QPENet), a new method that integrates query features into the generation process of foreground and background prototypes, thereby yielding customized prototypes att… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE TMM 2024

  48. arXiv:2403.05895  [pdf, other

    cs.CV

    DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and Depth from Monocular Videos

    Authors: Xiuzhe Wu, Xiaoyang Lyu, Qihao Huang, Yong Liu, Yang Wu, Ying Shan, Xiaojuan Qi

    Abstract: Although considerable advancements have been attained in self-supervised depth estimation from monocular videos, most existing methods often treat all objects in a video as static entities, which however violates the dynamic nature of real-world scenes and fails to model the geometry and motion of moving objects. In this paper, we propose a self-supervised method to jointly learn 3D motion and dep… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: 24 pages, 14 figures, Tech Report

  49. arXiv:2403.05834  [pdf, other

    cs.MM cs.SD eess.AS

    Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information

    Authors: Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu, Haozhi Huang, Helen Meng

    Abstract: Dance generation, as a branch of human motion generation, has attracted increasing attention. Recently, a few works attempt to enhance dance expressiveness, which includes genre matching, beat alignment, and dance dynamics, from certain aspects. However, the enhancement is quite limited as they lack comprehensive consideration of the aforementioned three factors. In this paper, we propose Expressi… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  50. arXiv:2403.05402  [pdf, other

    cs.CV

    DualBEV: CNN is All You Need in View Transformation

    Authors: Peidong Li, Wancheng Shen, Qihao Huang, Dixiao Cui

    Abstract: Camera-based Bird's-Eye-View (BEV) perception often struggles between adopting 3D-to-2D or 2D-to-3D view transformation (VT). The 3D-to-2D VT typically employs resource intensive Transformer to establish robust correspondences between 3D and 2D feature, while the 2D-to-3D VT utilizes the Lift-Splat-Shoot (LSS) pipeline for real-time application, potentially missing distant information. To address… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 16 pages, 6 figures, Tech Report