Skip to main content

Showing 1–50 of 138 results for author: Yuan, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.20006  [pdf, other

    cs.LG

    On the Trade-off between Flatness and Optimization in Distributed Learning

    Authors: Ying Cao, Zhaoxian Wu, Kun Yuan, Ali H. Sayed

    Abstract: This paper proposes a theoretical framework to evaluate and compare the performance of gradient-descent algorithms for distributed learning in relation to their behavior around local minima in nonconvex environments. Previous works have noticed that convergence toward flat local minima tend to enhance the generalization ability of learning algorithms. This work discovers two interesting results. F… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.11739  [pdf, other

    cs.CV

    V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

    Authors: Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou , et al. (9 additional authors not shown)

    Abstract: Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2406.00764  [pdf, other

    cs.LG

    IENE: Identifying and Extrapolating the Node Environment for Out-of-Distribution Generalization on Graphs

    Authors: Haoran Yang, Xiaobing Pei, Kai Yuan

    Abstract: Due to the performance degradation of graph neural networks (GNNs) under distribution shifts, the work on out-of-distribution (OOD) generalization on graphs has received widespread attention. A novel perspective involves distinguishing potential confounding biases from different environments through environmental identification, enabling the model to escape environmentally-sensitive correlations a… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  4. arXiv:2405.19988  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics

    Authors: Minttu Alakuijala, Reginald McLean, Isaac Woungang, Nariman Farsad, Samuel Kaski, Pekka Marttinen, Kai Yuan

    Abstract: Natural language is often the easiest and most convenient modality for humans to specify tasks for robots. However, learning to ground language to behavior typically requires impractical amounts of diverse, language-annotated demonstrations collected on each target robot. In this work, we aim to separate the problem of what to accomplish from how to accomplish it, as the former can benefit from su… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 10 pages in the main text, 16 pages including references and supplementary materials. 4 figures and 3 tables in the main text, 1 table in supplementary materials

  5. arXiv:2405.17765  [pdf, other

    cs.CV

    PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild

    Authors: Kun Yuan, Hongbo Liu, Mading Li, Muyi Sun, Ming Sun, Jiachao Gong, **hua Hao, Chao Zhou, Yansong Tang

    Abstract: Video quality assessment (VQA) is a challenging problem due to the numerous factors that can affect the perceptual quality of a video, \eg, content attractiveness, distortion type, motion pattern, and level. However, annotating the Mean opinion score (MOS) for videos is expensive and time-consuming, which limits the scale of VQA datasets, and poses a significant obstacle for deep learning-based me… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: CVPR 2024, 11 pages, 4 figures, 7 tables

  6. arXiv:2405.10075  [pdf, other

    cs.CV cs.AI

    HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition

    Authors: Kun Yuan, Vinkle Srivastav, Nassir Navab, Nicolas Padoy

    Abstract: Natural language could play an important role in develo** generalist surgical models by providing a broad source of supervision from raw texts. This flexible form of supervision can enable the model's transferability across datasets and tasks as natural language can be used to reference learned visual concepts or describe new ones. In this work, we present HecVL, a novel hierarchical video-langu… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted by MICCAI2024

  7. arXiv:2404.14443  [pdf

    cs.CL cs.AI

    Evaluation of Machine Translation Based on Semantic Dependencies and Keywords

    Authors: Kewei Yuan, Qiurong Zhao, Yang Xu, Xiao Zhang, Huansheng Ning

    Abstract: In view of the fact that most of the existing machine translation evaluation algorithms only consider the lexical and syntactic information, but ignore the deep semantic information contained in the sentence, this paper proposes a computational method for evaluating the semantic correctness of machine translations based on reference translations and incorporating semantic dependencies and sentence… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  8. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  9. arXiv:2404.09790  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, **hua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

  10. arXiv:2403.19140  [pdf, other

    cs.CV cs.AI

    QNCD: Quantization Noise Correction for Diffusion Models

    Authors: Huanpeng Chu, Wei Wu, Chengjie Zang, Kun Yuan

    Abstract: Diffusion models have revolutionized image synthesis, setting new benchmarks in quality and creativity. However, their widespread adoption is hindered by the intensive computation required during the iterative denoising process. Post-training quantization (PTQ) presents a solution to accelerate sampling, aibeit at the expense of sample quality, extremely in low-bit settings. Addressing this, our s… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  11. arXiv:2403.17607  [pdf, other

    cs.AI

    Fully-fused Multi-Layer Perceptrons on Intel Data Center GPUs

    Authors: Kai Yuan, Christoph Bauinger, Xiangyi Zhang, Pascal Baehr, Matthias Kirchhart, Darius Dabert, Adrien Tousnakhoff, Pierre Boudier, Michael Paulitsch

    Abstract: This paper presents a SYCL implementation of Multi-Layer Perceptrons (MLPs), which targets and is optimized for the Intel Data Center GPU Max 1550. To increase the performance, our implementation minimizes the slow global memory accesses by maximizing the data reuse within the general register file and the shared local memory by fusing the operations in each layer of the MLP. We show with a simple… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  12. arXiv:2403.13756  [pdf, other

    cs.CV

    Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model

    Authors: Diwei Wang, Kun Yuan, Candice Muller, Frédéric Blanc, Nicolas Padoy, Hyewon Seo

    Abstract: We present a knowledge augmentation strategy for assessing the diagnostic groups and gait impairment from monocular gait videos. Based on a large-scale pre-trained Vision Language Model (VLM), our model learns and improves visual, textual, and numerical representations of patient gait videos, through a collective learning across three distinct modalities: gait videos, class-specific descriptions,… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  13. arXiv:2403.11451  [pdf, other

    cs.CV

    CasSR: Activating Image Power for Real-World Image Super-Resolution

    Authors: Haolan Chen, **hua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou, Wei Hu

    Abstract: The objective of image super-resolution is to generate clean and high-resolution images from degraded versions. Recent advancements in diffusion modeling have led to the emergence of various image super-resolution techniques that leverage pretrained text-to-image (T2I) models. Nevertheless, due to the prevalent severe degradation in low-resolution images and the inherent characteristics of diffusi… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  14. arXiv:2403.07338  [pdf, ps, other

    cs.IT cs.MM eess.SP

    D$^2$-JSCC: Digital Deep Joint Source-channel Coding for Semantic Communications

    Authors: Jianhao Huang, Kai Yuan, Chuan Huang, Kaibin Huang

    Abstract: Semantic communications (SemCom) have emerged as a new paradigm for supporting sixth-generation applications, where semantic features of data are transmitted using artificial intelligence algorithms to attain high communication efficiencies. Most existing SemCom techniques utilize deep neural networks (DNNs) to implement analog source-channel map**s, which are incompatible with existing digital… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  15. arXiv:2403.05916  [pdf, other

    cs.CV cs.AI

    GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing

    Authors: Hao Lu, Xuesong Niu, Jiyao Wang, Yin Wang, Qingyong Hu, Jiaqi Tang, Yuting Zhang, Kaishen Yuan, Bin Huang, Zitong Yu, Dengbo He, Shuiguang Deng, Hao Chen, Yingcong Chen, Shiguang Shan

    Abstract: Multimodal large language models (MLLMs) are designed to process and integrate information from multiple sources, such as text, speech, images, and videos. Despite its success in language understanding, it is critical to evaluate the performance of downstream tasks for better human-centric applications. This paper assesses the application of MLLMs with 5 crucial abilities for affective computing,… ▽ More

    Submitted 10 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

  16. arXiv:2403.05220  [pdf, other

    cs.CV cs.AI cs.LG q-bio.TO

    Synthetic Privileged Information Enhances Medical Image Representation Learning

    Authors: Lucas Farndale, Chris Walsh, Robert Insall, Ke Yuan

    Abstract: Multimodal self-supervised representation learning has consistently proven to be a highly effective method in medical image analysis, offering strong task performance and producing biologically informed insights. However, these methods heavily rely on large, paired datasets, which is prohibitive for their use in scenarios where paired data does not exist, or there is only a small amount available.… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  17. arXiv:2403.05049  [pdf, other

    cs.CV

    XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution

    Authors: Yunpeng Qu, Kun Yuan, Kai Zhao, Qizhi Xie, **hua Hao, Ming Sun, Chao Zhou

    Abstract: Diffusion-based methods, endowed with a formidable generative prior, have received increasing attention in Image Super-Resolution (ISR) recently. However, as low-resolution (LR) images often undergo severe degradation, it is challenging for ISR models to perceive the semantic and degradation information, resulting in restoration images with incorrect content or unrealistic artifacts. To address th… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 19 pages, 7 figures

  18. arXiv:2403.04697  [pdf, other

    cs.CV cs.AI

    AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors

    Authors: Kaishen Yuan, Zitong Yu, Xin Liu, Weicheng Xie, Huan**g Yue, **gyu Yang

    Abstract: Facial Action Units (AU) is a vital concept in the realm of affective computing, and AU detection has always been a hot research topic. Existing methods suffer from overfitting issues due to the utilization of a large number of learnable parameters on scarce AU-annotated datasets or heavy reliance on substantial additional relevant data. Parameter-Efficient Transfer Learning (PETL) provides a prom… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 19 pages, 6 figures

  19. arXiv:2403.01753  [pdf, other

    cs.CV

    Training-Free Pretrained Model Merging

    Authors: Zhengqi Xu, Ke Yuan, Huiqiong Wang, Yong Wang, Mingli Song, Jie Song

    Abstract: Recently, model merging techniques have surfaced as a solution to combine multiple single-talent models into a single multi-talent model. However, previous endeavors in this field have either necessitated additional training or fine-tuning processes, or require that the models possess the same pre-trained initialization. In this work, we identify a common drawback in prior works w.r.t. the inconsi… ▽ More

    Submitted 15 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: CVPR2024 accepted

  20. arXiv:2402.07220  [pdf, other

    eess.IV cs.CV

    KVQ: Kwai Video Quality Assessment for Short-form Videos

    Authors: Yiting Lu, Xin Li, Ya**g Pei, Kun Yuan, Qizhi Xie, Yunpeng Qu, Ming Sun, Chao Zhou, Zhibo Chen

    Abstract: Short-form UGC video platforms, like Kwai and TikTok, have been an emerging and irreplaceable mainstream media form, thriving on user-friendly engagement, and kaleidoscope creation, etc. However, the advancing content-generation modes, e.g., special effects, and sophisticated processing workflows, e.g., de-artifacts, have introduced significant challenges to recent UGC video quality assessment: (i… ▽ More

    Submitted 20 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: 19 pages

  21. arXiv:2402.05529  [pdf, other

    cs.LG cs.MA

    Asynchronous Diffusion Learning with Agent Subsampling and Local Updates

    Authors: Elsa Rizk, Kun Yuan, Ali H. Sayed

    Abstract: In this work, we examine a network of agents operating asynchronously, aiming to discover an ideal global model that suits individual local datasets. Our assumption is that each agent independently chooses when to participate throughout the algorithm and the specific subset of its neighbourhood with which it will cooperate at any given moment. When an agent chooses to take part, it undergoes multi… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  22. arXiv:2402.03167  [pdf, other

    math.OC cs.LG stat.ML

    Decentralized Bilevel Optimization over Graphs: Loopless Algorithmic Update and Transient Iteration Complexity

    Authors: Boao Kong, Shuchen Zhu, Songtao Lu, Xinmeng Huang, Kun Yuan

    Abstract: Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. Howev… ▽ More

    Submitted 26 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 37 pages, 6 figures

  23. arXiv:2402.01933  [pdf, other

    eess.AS cs.SD

    ToMoBrush: Exploring Dental Health Sensing using a Sonic Toothbrush

    Authors: Kuang Yuan, Mohamed Ibrahim, Yiwen Song, Guoxiang Deng, Suvendra Vijayan, Robert Nerone, Akshay Gadre, Swarun Kumar

    Abstract: Early detection of dental disease is crucial to prevent adverse outcomes. Today, dental X-rays are currently the most accurate gold standard for dental disease detection. Unfortunately, regular X-ray exam is still a privilege for billions of people around the world. In this paper, we ask: "Can we develop a low-cost sensing system that enables dental self-examination in the comfort of one's home?"… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    ACM Class: J.3; C.3; H.5.2

  24. arXiv:2401.14915  [pdf, other

    cs.HC cs.AI

    Charting the Future of AI in Project-Based Learning: A Co-Design Exploration with Students

    Authors: Chengbo Zheng, Kangyu Yuan, Bingcan Guo, Reza Hadi Mogavi, Zhenhui Peng, Shuai Ma, Xiaojuan Ma

    Abstract: The increasing use of Artificial Intelligence (AI) by students in learning presents new challenges for assessing their learning outcomes in project-based learning (PBL). This paper introduces a co-design study to explore the potential of students' AI usage data as a novel material for PBL assessment. We conducted workshops with 18 college students, encouraging them to speculate an alternative worl… ▽ More

    Submitted 29 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Conditionally accepted by CHI '24

  25. arXiv:2401.03664  [pdf

    eess.IV cs.CV cs.LG

    Dual-Channel Reliable Breast Ultrasound Image Classification Based on Explainable Attribution and Uncertainty Quantification

    Authors: Shuge Lei, Haonan Hu, Dasheng Sun, Huabin Zhang, Kehong Yuan, Jian Dai, Jijun Tang, Yan Tong

    Abstract: This paper focuses on the classification task of breast ultrasound images and researches on the reliability measurement of classification results. We proposed a dual-channel evaluation framework based on the proposed inference reliability and predictive reliability scores. For the inference reliability evaluation, human-aligned and doctor-agreed inference rationales based on the improved feature a… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  26. arXiv:2312.10251  [pdf, other

    cs.CV cs.AI

    Advancing Surgical VQA with Scene Graph Knowledge

    Authors: Kun Yuan, Manasi Kattel, Joel L. Lavanchy, Nassir Navab, Vinkle Srivastav, Nicolas Padoy

    Abstract: Modern operating room is becoming increasingly complex, requiring innovative intra-operative support systems. While the focus of surgical data science has largely been on video analysis, integrating surgical computer vision with language capabilities is emerging as a necessity. Our work aims to advance Visual Question Answering (VQA) in the surgical context with scene graph knowledge, addressing t… ▽ More

    Submitted 24 June, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: IPCAI 2024, Int J CARS (2024)

  27. arXiv:2312.02111  [pdf, other

    cs.CV cs.AI cs.LG q-bio.TO

    TriDeNT: Triple Deep Network Training for Privileged Knowledge Distillation in Histopathology

    Authors: Lucas Farndale, Robert Insall, Ke Yuan

    Abstract: Computational pathology models rarely utilise data that will not be available for inference. This means most models cannot learn from highly informative data such as additional immunohistochemical (IHC) stains and spatial transcriptomics. We present TriDeNT, a novel self-supervised method for utilising privileged data that is not available during inference to improve performance. We demonstrate th… ▽ More

    Submitted 5 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

  28. arXiv:2311.16420  [pdf, other

    cs.LG cs.CV

    Model-free Test Time Adaptation for Out-Of-Distribution Detection

    Authors: YiFan Zhang, Xue Wang, Tian Zhou, Kun Yuan, Zhang Zhang, Liang Wang, Rong **, Tieniu Tan

    Abstract: Out-of-distribution (OOD) detection is essential for the reliability of ML models. Most existing methods for OOD detection learn a fixed decision criterion from a given in-distribution dataset and apply it universally to decide if a data point is OOD. Recent work~\cite{fang2022is} shows that given only in-distribution data, it is impossible to reliably detect OOD data without extra assumptions. Mo… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 12 pages, 10 figures

  29. arXiv:2310.15598  [pdf, other

    cs.IT

    Coded Computing for Half-Duplex Wireless Distributed Computing Systems via Interference Alignment

    Authors: Youlong Wu, Zhenhao Huang, Kai Yuan, Shuai Ma, Yue Bi

    Abstract: Distributed computing frameworks such as MapReduce and Spark are often used to process large-scale data computing jobs. In wireless scenarios, exchanging data among distributed nodes would seriously suffer from the communication bottleneck due to limited communication resources such as bandwidth and power. To address this problem, we propose a coded parallel computing (CPC) scheme for distributed… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 17 pages, 6 figures

  30. arXiv:2310.07983  [pdf, other

    cs.LG math.OC stat.ML

    Revisiting Decentralized ProxSkip: Achieving Linear Speedup

    Authors: Luyao Guo, Sulaiman A. Alghunaim, Kun Yuan, Laurent Condat, **de Cao

    Abstract: The ProxSkip algorithm for decentralized and federated learning is gaining increasing attention due to its proven benefits in accelerating communication complexity while maintaining robustness against data heterogeneity. However, existing analyses of ProxSkip are limited to the strongly convex setting and do not achieve linear speedup, where convergence performance increases linearly with respect… ▽ More

    Submitted 19 April, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  31. arXiv:2309.16179  [pdf, other

    cs.CV

    BEVHeight++: Toward Robust Visual Centric 3D Object Detection

    Authors: Lei Yang, Tao Tang, Jun Li, Peng Chen, Kun Yuan, Li Wang, Yi Huang, Xinyu Zhang, Kaicheng Yu

    Abstract: While most recent autonomous driving system focuses on develo** perception methods on ego-vehicle sensors, people tend to overlook an alternative approach to leverage intelligent roadside cameras to extend the perception ability beyond the visual range. We discover that the state-of-the-art vision-centric bird's eye view detection methods have inferior performances on roadside cameras. This is b… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.08498

  32. arXiv:2308.07775  [pdf

    cs.RO cs.AI cs.LG eess.SY

    Hierarchical generative modelling for autonomous robots

    Authors: Kai Yuan, Noor Sajid, Karl Friston, Zhibin Li

    Abstract: Humans can produce complex whole-body motions when interacting with their surroundings, by planning, executing and combining individual limb movements. We investigated this fundamental aspect of motor control in the setting of autonomous robotic operations. We approach this problem by hierarchical generative modelling equipped with multi-level planning-for autonomous task completion-that mimics th… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  33. arXiv:2308.07770  [pdf, other

    cs.CV

    Multi-scale Promoted Self-adjusting Correlation Learning for Facial Action Unit Detection

    Authors: Xin Liu, Kaishen Yuan, Xuesong Niu, **gang Shi, Zitong Yu, Huan**g Yue, **gyu Yang

    Abstract: Facial Action Unit (AU) detection is a crucial task in affective computing and social robotics as it helps to identify emotions expressed through facial expressions. Anatomically, there are innumerable correlations between AUs, which contain rich information and are vital for AU detection. Previous methods used fixed AU correlations based on expert experience or statistical rules on specific bench… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 13pages, 7 figures

  34. arXiv:2308.00729  [pdf, other

    cs.CV

    Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment

    Authors: Hongbo Liu, Mingda Wu, Kun Yuan, Ming Sun, Yansong Tang, Chuanchuan Zheng, Xing Wen, Xiu Li

    Abstract: Video quality assessment (VQA) has attracted growing attention in recent years. While the great expense of annotating large-scale VQA datasets has become the main obstacle for current deep-learning methods. To surmount the constraint of insufficient training data, in this paper, we first consider the complete range of video distribution diversity (\ie content, distortion, motion) and employ divers… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: 10 pages, 5 figures, to appear in ACM MM 2023

  35. arXiv:2307.16813  [pdf, other

    cs.CV

    Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment

    Authors: Kun Yuan, Zishang Kong, Chuanchuan Zheng, Ming Sun, Xing Wen

    Abstract: Video Quality Assessment (VQA), which aims to predict the perceptual quality of a video, has attracted raising attention with the rapid development of streaming media technology, such as Facebook, TikTok, Kwai, and so on. Compared with other sequence-based visual tasks (\textit{e.g.,} action recognition), VQA faces two under-estimated challenges unresolved in User Generated Content (UGC) videos. \… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: 10 pages, 7 figures, to appear in ACM MM 2023

  36. arXiv:2307.15220  [pdf, other

    cs.CV cs.AI

    Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures

    Authors: Kun Yuan, Vinkle Srivastav, Tong Yu, Joel L. Lavanchy, Pietro Mascagni, Nassir Navab, Nicolas Padoy

    Abstract: Recent advancements in surgical computer vision applications have been driven by fully-supervised methods, primarily using only visual data. These methods rely on manually annotated surgical videos to predict a fixed set of object categories, limiting their generalizability to unseen surgical procedures and downstream tasks. In this work, we put forward the idea that the surgical video lectures av… ▽ More

    Submitted 13 January, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

  37. arXiv:2307.09729  [pdf, other

    cs.CV cs.MM eess.IV

    NTIRE 2023 Quality Assessment of Video Enhancement Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu , et al. (47 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  38. arXiv:2306.16504  [pdf, other

    cs.LG math.OC

    Momentum Benefits Non-IID Federated Learning Simply and Provably

    Authors: Ziheng Cheng, Xinmeng Huang, Pengfei Wu, Kun Yuan

    Abstract: Federated learning is a powerful paradigm for large-scale machine learning, but it faces significant challenges due to unreliable network connections, slow communication, and substantial data heterogeneity across clients. FedAvg and SCAFFOLD are two prominent algorithms to address these challenges. In particular, FedAvg employs multiple local updates before communicating with a central server, whi… ▽ More

    Submitted 5 March, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

  39. arXiv:2306.07307  [pdf, other

    cs.LG cs.AI

    Online Prototype Alignment for Few-shot Policy Transfer

    Authors: Qi Yi, Rui Zhang, Shaohui Peng, Jiaming Guo, Yunkai Gao, Kaizhao Yuan, Ruizhi Chen, Siming Lan, Xing Hu, Zidong Du, Xishan Zhang, Qi Guo, Yunji Chen

    Abstract: Domain adaptation in reinforcement learning (RL) mainly deals with the changes of observation when transferring the policy to a new environment. Many traditional approaches of domain adaptation in RL manage to learn a map** function between the source and target domain in explicit or implicit ways. However, they typically require access to abundant data from the target domain. Besides, they ofte… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: This paper has been accepted at ICML2023

  40. arXiv:2306.00256  [pdf, other

    cs.LG

    DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm

    Authors: Lisang Ding, Kexin **, Bicheng Ying, Kun Yuan, Wotao Yin

    Abstract: Decentralized Stochastic Gradient Descent (SGD) is an emerging neural network training approach that enables multiple agents to train a model collaboratively and simultaneously. Rather than using a central parameter server to collect gradients from all the agents, each agent keeps a copy of the model parameters and communicates with a small number of other agents to exchange model updates. Their c… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

  41. arXiv:2305.16297  [pdf, other

    cs.LG cs.DC math.OC

    Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?

    Authors: Yutong He, Xinmeng Huang, Kun Yuan

    Abstract: Communication compression is a common technique in distributed optimization that can alleviate communication overhead by transmitting compressed gradients and model parameters. However, compression can introduce information distortion, which slows down convergence and incurs more communication rounds to achieve desired solutions. Given the trade-off between lower per-round communication costs and… ▽ More

    Submitted 10 January, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted by NeurIPS 2023

  42. arXiv:2305.07612  [pdf, other

    cs.LG cs.DC math.OC

    Lower Bounds and Accelerated Algorithms in Distributed Stochastic Optimization with Communication Compression

    Authors: Yutong He, Xinmeng Huang, Yiming Chen, Wotao Yin, Kun Yuan

    Abstract: Communication compression is an essential strategy for alleviating communication overhead by reducing the volume of information exchanged between computing nodes in large-scale distributed stochastic optimization. Although numerous algorithms with convergence guarantees have been obtained, the optimal performance limit under communication compression remains unclear. In this paper, we investigat… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  43. arXiv:2304.12566  [pdf, other

    cs.LG

    AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation

    Authors: Yi-Fan Zhang, Xue Wang, Kexin **, Kun Yuan, Zhang Zhang, Liang Wang, Rong **, Tieniu Tan

    Abstract: Many recent machine learning tasks focus to develop models that can generalize to unseen distributions. Domain generalization (DG) has become one of the key topics in various fields. Several literatures show that DG can be arbitrarily hard without exploiting target domain information. To address this issue, test-time adaptive (TTA) methods are proposed. Existing TTA methods require offline target… ▽ More

    Submitted 9 May, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: 30 pages, 12 figures

    Journal ref: The Fortieth International Conference on Machine Learning, ICML, 2023

  44. arXiv:2304.06440  [pdf, other

    cs.CV

    Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment

    Authors: Kai Zhao, Kun Yuan, Ming Sun, Xing Wen

    Abstract: Video quality assessment (VQA) aims to simulate the human perception of video quality, which is influenced by factors ranging from low-level color and texture details to high-level semantic content. To effectively model these complicated quality-related factors, in this paper, we decompose video into three levels (\ie, patch level, frame level, and clip level), and propose a novel Zoom-VQA archite… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR 2023 Workshop

  45. arXiv:2303.10656  [pdf, other

    eess.IV cs.CV

    More From Less: Self-Supervised Knowledge Distillation for Routine Histopathology Data

    Authors: Lucas Farndale, Robert Insall, Ke Yuan

    Abstract: Medical imaging technologies are generating increasingly large amounts of high-quality, information-dense data. Despite the progress, practical use of advanced imaging technologies for research and diagnosis remains limited by cost and availability, so information-sparse data such as H&E stains are relied on in practice. The study of diseased tissue requires methods which can leverage these inform… ▽ More

    Submitted 21 July, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

  46. arXiv:2303.08498  [pdf, other

    cs.CV

    BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection

    Authors: Lei Yang, Kaicheng Yu, Tao Tang, Jun Li, Kun Yuan, Li Wang, Xinyu Zhang, Peng Chen

    Abstract: While most recent autonomous driving system focuses on develo** perception methods on ego-vehicle sensors, people tend to overlook an alternative approach to leverage intelligent roadside cameras to extend the perception ability beyond the visual range. We discover that the state-of-the-art vision-centric bird's eye view detection methods have inferior performances on roadside cameras. This is b… ▽ More

    Submitted 11 April, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  47. arXiv:2303.00521  [pdf, other

    cs.CV

    Quality-aware Pre-trained Models for Blind Image Quality Assessment

    Authors: Kai Zhao, Kun Yuan, Ming Sun, Mading Li, Xing Wen

    Abstract: Blind image quality assessment (BIQA) aims to automatically evaluate the perceived quality of a single image, whose performance has been improved by deep learning-based methods in recent years. However, the paucity of labeled data somewhat restrains deep learning-based BIQA methods from unleashing their full potential. In this paper, we propose to solve the problem by a pretext task customized for… ▽ More

    Submitted 23 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  48. arXiv:2302.06294  [pdf, other

    eess.IV cs.CV cs.LG

    CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection

    Authors: Chinedu Innocent Nwoye, Tong Yu, Saurav Sharma, Aditya Murali, Deepak Alapatt, Armine Vardazaryan, Kun Yuan, Jonas Hajek, Wolfgang Reiter, Amine Yamlahi, Finn-Henri Smidt, Xiaoyang Zou, Guoyan Zheng, Bruno Oliveira, Helena R. Torres, Satoshi Kondo, Satoshi Kasai, Felix Holm, Ege Özsoy, Shuangchun Gui, Han Li, Sista Raviteja, Rachana Sathish, Pranav Poudel, Binod Bhattarai , et al. (24 additional authors not shown)

    Abstract: Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier effor… ▽ More

    Submitted 14 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: MICCAI EndoVis CholecTriplet2022 challenge report. Published at Elsevier journal of Medical Image Analysis. 25 pages, 15 figures, 8 tables

    Journal ref: Medical Image Analysis, Volume 89, 2023, 102888, ISSN 1361-8415

  49. arXiv:2212.10744  [pdf, other

    cs.SD cs.CV

    An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits

    Authors: Kai Li, Fenghua Xie, Hang Chen, Kexin Yuan, Xiaolin Hu

    Abstract: Audio-visual approaches involving visual inputs have laid the foundation for recent progress in speech separation. However, the optimization of the concurrent usage of auditory and visual inputs is still an active research area. Inspired by the cortico-thalamo-cortical circuit, in which the sensory processing mechanisms of different modalities modulate one another via the non-lemniscal sensory tha… ▽ More

    Submitted 22 March, 2024; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Accepted by TPAMI 2024

  50. arXiv:2211.00917  [pdf, other

    cs.RO

    A Novel Autonomous Robotics System for Aquaculture Environment Monitoring

    Authors: Tianqi Zhang, Tong Shen, Kai Yuan, Kaiwen Xue, Huihuan Qian

    Abstract: Implementing fully automatic unmanned surface vehicles (USVs) monitoring water quality is challenging since effectively collecting environmental data while kee** the platform stable and environmental-friendly is hard to approach. To address this problem, we construct a USV that can automatically navigate an efficient path to sample water quality parameters in order to monitor the aquatic environ… ▽ More

    Submitted 7 November, 2022; v1 submitted 2 November, 2022; originally announced November 2022.