Skip to main content

Showing 1–50 of 141 results for author: Kong, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13490  [pdf, other

    cs.LG cs.GT

    The Surprising Benefits of Base Rate Neglect in Robust Aggregation

    Authors: Yuqing Kong, Shu Wang, Ying Wang

    Abstract: Robust aggregation integrates predictions from multiple experts without knowledge of the experts' information structures. Prior work assumes experts are Bayesian, providing predictions as perfect posteriors based on their signals. However, real-world experts often deviate systematically from Bayesian reasoning. Our work considers experts who tend to ignore the base rate. We find that a certain deg… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2406.11903  [pdf, other

    q-fin.GN cs.AI q-fin.CP

    A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges

    Authors: Yuqi Nie, Yaxuan Kong, Xiaowen Dong, John M. Mulvey, H. Vincent Poor, Qingsong Wen, Stefan Zohren

    Abstract: Recent advances in large language models (LLMs) have unlocked novel opportunities for machine learning applications in the financial domain. These models have demonstrated remarkable capabilities in understanding context, processing vast amounts of data, and generating human-preferred contents. In this survey, we explore the application of LLMs on various financial tasks, focusing on their potenti… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  3. arXiv:2406.04140  [pdf, other

    cs.SD eess.AS

    STraDa: A Singer Traits Dataset

    Authors: Yuexuan Kong, Viet-Anh Tran, Romain Hennequin

    Abstract: There is a limited amount of large-scale public datasets that contain downloadable music audio files and rich lead singer metadata. To provide such a dataset to benefit research in singing voices, we created Singer Traits Dataset (STraDa) with two subsets: automatic-strada and annotated-strada. The automatic-strada contains twenty-five thousand tracks across numerous genres and languages of more t… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  4. arXiv:2406.03102  [pdf, other

    cs.LG cs.AI

    DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays

    Authors: Bo Xia, Yilun Kong, Yongzhe Chang, Bo Yuan, Zhiheng Li, Xueqian Wang, Bin Liang

    Abstract: Classic reinforcement learning (RL) frequently confronts challenges in tasks involving delays, which cause a mismatch between received observations and subsequent actions, thereby deviating from the Markov assumption. Existing methods usually tackle this issue with end-to-end solutions using state augmentation. However, these black-box approaches often involve incomprehensible processes and redund… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  5. arXiv:2406.01066  [pdf, other

    cs.LG

    Topology-Aware Dynamic Reweighting for Distribution Shifts on Graph

    Authors: Weihuang Zheng, Jiashuo Liu, Jiaxing Li, Jiayun Wu, Peng Cui, Youyong Kong

    Abstract: Graph Neural Networks (GNNs) are widely used for node classification tasks but often fail to generalize when training and test nodes come from different distributions, limiting their practicality. To overcome this, recent approaches adopt invariant learning techniques from the out-of-distribution (OOD) generalization field, which seek to establish stable prediction methods across environments. How… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  6. arXiv:2405.19843  [pdf, other

    cs.GT

    How Gold to Make the Golden Snitch: Designing the "Game Changer" in Esports

    Authors: Zhihuan Huang, Yuxuan Lu, Yongkang Guo, Yuqing Kong

    Abstract: Many battling games utilize a special item (e.g. Roshan in Defense of the Ancients 2 (DOTA 2), Baron Nashor in League of Legends (LOL), Golden Snitch in Quidditch) as a potential ``Game Changer''. The reward of this item can enable the underdog to make a comeback. However, if the reward is excessively high, the whole game may devolve into a chase for the ``Game Changer''. Our research initiates wi… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  7. arXiv:2405.15077  [pdf, other

    cs.CL cs.AI cs.GT

    Eliciting Informative Text Evaluations with Large Language Models

    Authors: Yuxuan Lu, Shengwei Xu, Yichi Zhang, Yuqing Kong, Grant Schoenebeck

    Abstract: Peer prediction mechanisms motivate high-quality feedback with provable guarantees. However, current methods only apply to rather simple reports, like multiple-choice or scalar numbers. We aim to broaden these techniques to the larger domain of text-based reports, drawing on the recent developments in large language models. This vastly increases the applicability of peer prediction mechanisms as t… ▽ More

    Submitted 28 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted by the Twenty-Fifth ACM Conference on Economics and Computation (EC'24)

  8. arXiv:2405.09463  [pdf, other

    cs.CV

    Gaze-DETR: Using Expert Gaze to Reduce False Positives in Vulvovaginal Candidiasis Screening

    Authors: Yan Kong, Sheng Wang, Jiangdong Cai, Zihao Zhao, Zhenrong Shen, Yonghao Li, Manman Fei, Qian Wang

    Abstract: Accurate detection of vulvovaginal candidiasis is critical for women's health, yet its sparse distribution and visually ambiguous characteristics pose significant challenges for accurate identification by pathologists and neural networks alike. Our eye-tracking data reveals that areas garnering sustained attention - yet not marked by experts after deliberation - are often aligned with false positi… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: MICCAI-2024 early accept. Our code is available at https://github.com/YanKong0408/Gaze-DETR

  9. arXiv:2405.09457  [pdf, ps, other

    cond-mat.stat-mech cs.CC math.CO

    Recurrence solution of monomer-polymer models on two-dimensional rectangular lattices

    Authors: Yong Kong

    Abstract: The problem of counting polymer coverings on the rectangular lattices is investigated. In this model, a linear rigid polymer covers $k$ adjacent lattice sites such that no two polymers occupy a common site. Those unoccupied lattice sites are considered as monomers. We prove that for a given number of polymers ($k$-mers), the number of arrangements for the polymers on two-dimensional rectangular la… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    MSC Class: 05A15 (Primary) 82B20; 03D15 (Secondary) ACM Class: F.1.3

  10. arXiv:2405.04476  [pdf, other

    eess.AS cs.SD

    BERP: A Blind Estimator of Room Acoustic and Physical Parameters for Single-Channel Noisy Speech Signals

    Authors: Lijun Wang, Yixian Lu, Ziyan Gao, Kai Li, Jianqiang Huang, Yuntao Kong, Shogo Okada

    Abstract: Room acoustic parameters (RAPs) and room physical parameters ( RPPs) are essential metrics for parameterizing the room acoustical characteristics (RAC) of a sound field around a listener's local environment, offering comprehensive indications for various applications. The current RAPs and RPPs estimation methods either fall short of covering broad real-world acoustic environments in the context of… ▽ More

    Submitted 16 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 13-page, Submitted to IEEE/ACM Transaction on Audio Speech and Language Processing (TASLP)

  11. arXiv:2404.18687  [pdf, other

    cs.RO eess.SY

    Socially Adaptive Path Planning Based on Generative Adversarial Network

    Authors: Yao Wang, Yuqi Kong, Wenzheng Chi, Lining Sun

    Abstract: The natural interaction between robots and pedestrians in the process of autonomous navigation is crucial for the intelligent development of mobile robots, which requires robots to fully consider social rules and guarantee the psychological comfort of pedestrians. Among the research results in the field of robotic path planning, the learning-based socially adaptive algorithms have performed well i… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  12. arXiv:2404.05052  [pdf, other

    cs.CV

    Facial Affective Behavior Analysis with Instruction Tuning

    Authors: Yifan Li, Anh Dao, Wentao Bao, Zhen Tan, Tianlong Chen, Huan Liu, Yu Kong

    Abstract: Facial affective behavior analysis (FABA) is crucial for understanding human mental states from images. However, traditional approaches primarily deploy models to discriminate among discrete emotion categories, and lack the fine granularity and reasoning capability for complex facial behaviors. The advent of Multi-modal Large Language Models (MLLMs) has been proven successful in general visual und… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: V1.0

  13. arXiv:2403.10004  [pdf, other

    cs.CV

    ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images

    Authors: Xiangtian Xue, Jiasong Wu, Youyong Kong, Lotfi Senhadji, Huazhong Shu

    Abstract: We present a novel image editing scenario termed Text-grounded Object Generation (TOG), defined as generating a new object in the real image spatially conditioned by textual descriptions. Existing diffusion models exhibit limitations of spatial perception in complex real-world scenes, relying on additional modalities to enforce constraints, and TOG imposes heightened challenges on scene comprehens… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  14. arXiv:2403.09128  [pdf, other

    cs.CV

    Rethinking Referring Object Removal

    Authors: Xiangtian Xue, Jiasong Wu, Youyong Kong, Lotfi Senhadji, Huazhong Shu

    Abstract: Referring object removal refers to removing the specific object in an image referred by natural language expressions and filling the missing region with reasonable semantics. To address this task, we construct the ComCOCO, a synthetic dataset consisting of 136,495 referring expressions for 34,615 objects in 23,951 image pairs. Each pair contains an image with referring expressions and the ground t… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  15. arXiv:2403.08222  [pdf, other

    cs.LG cs.AI

    Robust Decision Aggregation with Adversarial Experts

    Authors: Yongkang Guo, Yuqing Kong

    Abstract: We consider a binary decision aggregation problem in the presence of both truthful and adversarial experts. The truthful experts will report their private signals truthfully with proper incentive, while the adversarial experts can report arbitrarily. The decision maker needs to design a robust aggregator to forecast the true state of the world based on the reports of experts. The decision maker do… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  16. arXiv:2403.08157  [pdf

    cs.CV

    Multiscale Low-Frequency Memory Network for Improved Feature Extraction in Convolutional Neural Networks

    Authors: Fuzhi Wu, Jiasong Wu, Youyong Kong, Chunfeng Yang, Guanyu Yang, Huazhong Shu, Guy Carrault, Lotfi Senhadji

    Abstract: Deep learning and Convolutional Neural Networks (CNNs) have driven major transformations in diverse research areas. However, their limitations in handling low-frequency information present obstacles in certain tasks like interpreting global structures or managing smooth transition images. Despite the promising performance of transformer structures in numerous tasks, their intricate optimization co… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 9 pages, 10 figures,6 tables. AAAI 2024 conference

  17. arXiv:2403.03412  [pdf, other

    cs.LG cs.CV

    Advancing Out-of-Distribution Detection through Data Purification and Dynamic Activation Function Design

    Authors: Yingrui Ji, Yao Zhu, Zhigang Li, Jiansheng Chen, Yunlong Kong, **gbo Chen

    Abstract: In the dynamic realms of machine learning and deep learning, the robustness and reliability of models are paramount, especially in critical real-world applications. A fundamental challenge in this sphere is managing Out-of-Distribution (OOD) samples, significantly increasing the risks of model misclassification and uncertainty. Our work addresses this challenge by enhancing the detection and manag… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  18. arXiv:2402.18211  [pdf, other

    cs.LG cs.CR

    Catastrophic Overfitting: A Potential Blessing in Disguise

    Authors: Mengnan Zhao, Lihe Zhang, Yuqiu Kong, Baocai Yin

    Abstract: Fast Adversarial Training (FAT) has gained increasing attention within the research community owing to its efficacy in improving adversarial robustness. Particularly noteworthy is the challenge posed by catastrophic overfitting (CO) in this field. Although existing FAT approaches have made strides in mitigating CO, the ascent of adversarial robustness occurs with a non-negligible decline in classi… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  19. arXiv:2402.14859  [pdf, other

    cs.CR cs.AI cs.CY cs.LG

    The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative

    Authors: Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan Li, Yu Kong, Tianlong Chen, Huan Liu

    Abstract: Due to their unprecedented ability to process and respond to various types of data, Multimodal Large Language Models (MLLMs) are constantly defining the new boundary of Artificial General Intelligence (AGI). As these advanced generative models increasingly form collaborative networks for complex tasks, the integrity and security of these systems are crucial. Our paper, ``The Wolf Within'', explore… ▽ More

    Submitted 2 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to workshop on ReGenAI@CVPR 2024

  20. DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

    Authors: Chong Zeng, Yue Dong, Pieter Peers, Youkang Kong, Hongzhi Wu, Xin Tong

    Abstract: This paper presents a novel method for exerting fine-grained lighting control during text-driven diffusion-based image generation. While existing diffusion models already have the ability to generate images under any lighting condition, without additional guidance these models tend to correlate image content and lighting. Moreover, text prompts lack the necessary expressional power to describe det… ▽ More

    Submitted 27 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to SIGGRAPH 2024. Project page: https://dilightnet.github.io/

    Journal ref: ACM SIGGRAPH 2024 Conference Proceedings

  21. arXiv:2402.06062  [pdf, ps, other

    cs.GT math.ST

    Peer Expectation in Robust Forecast Aggregation: The Possibility/Impossibility

    Authors: Yuqing Kong

    Abstract: Recently a growing literature study a new forecast aggregation setting where each forecaster is additionally asked ``what's your expectation for the average of other forecasters' forecasts?''. However, most theoretic results in this setting focus on the scenarios where the additional second-order information helps optimally aggregate the forecasts. Here we adopt an adversarial approach and follow… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  22. arXiv:2402.05947  [pdf, other

    cs.LG cs.CV

    Separable Multi-Concept Erasure from Diffusion Models

    Authors: Mengnan Zhao, Lihe Zhang, Tianhang Zheng, Yuqiu Kong, Baocai Yin

    Abstract: Large-scale diffusion models, known for their impressive image generation capabilities, have raised concerns among researchers regarding social impacts, such as the imitation of copyrighted artistic styles. In response, existing approaches turn to machine unlearning techniques to eliminate unsafe concepts from pre-trained models. However, these methods compromise the generative performance and neg… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  23. arXiv:2402.05642  [pdf, ps, other

    eess.IV cs.CV

    An Optimization-based Baseline for Rigid 2D/3D Registration Applied to Spine Surgical Navigation Using CMA-ES

    Authors: Minheng Chen, Tonglong Li, Zhirun Zhang, Youyong Kong

    Abstract: A robust and efficient optimization-based 2D/3D registration framework is crucial for the navigation system of orthopedic surgical robots. It can provide precise position information of surgical instruments and implants during surgery. While artificial intelligence technology has advanced rapidly in recent years, traditional optimization-based registration methods remain indispensable in the field… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  24. arXiv:2402.02498  [pdf, other

    eess.IV cs.AI cs.CV

    Fully Differentiable Correlation-driven 2D/3D Registration for X-ray to CT Image Fusion

    Authors: Minheng Chen, Zhirun Zhang, Shuheng Gu, Zhangyang Ge, Youyong Kong

    Abstract: Image-based rigid 2D/3D registration is a critical technique for fluoroscopic guided surgical interventions. In recent years, some learning-based fully differentiable methods have produced beneficial outcomes while the process of feature extraction and gradient flow transmission still lack controllability and interpretability. To alleviate these problems, in this work, we propose a novel fully dif… ▽ More

    Submitted 15 March, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: ISBI 2024

  25. arXiv:2401.17743  [pdf, other

    cs.LG cs.GT

    Algorithmic Robust Forecast Aggregation

    Authors: Yongkang Guo, Jason D. Hartline, Zhihuan Huang, Yuqing Kong, Anant Shah, Fang-Yi Yu

    Abstract: Forecast aggregation combines the predictions of multiple forecasters to improve accuracy. However, the lack of knowledge about forecasters' information structure hinders optimal aggregation. Given a family of information structures, robust forecast aggregation aims to find the aggregator with minimal worst-case regret compared to the omniscient aggregator. Previous approaches for robust forecast… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  26. Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion

    Authors: Cunhang Fan, Yujie Chen, Jun Xue, Yonghui Kong, Jianhua Tao, Zhao Lv

    Abstract: In recent years, knowledge graph completion (KGC) models based on pre-trained language model (PLM) have shown promising results. However, the large number of parameters and high computational cost of PLM models pose challenges for their application in downstream tasks. This paper proposes a progressive distillation method based on masked generation features for KGC task, aiming to significantly re… ▽ More

    Submitted 10 June, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI2024

    Journal ref: (2024) Vol. 38 No. 8: AAAI-24 Technical Tracks 8 Vol. 38 No. 8: AAAI-24 Technical Tracks 8 Vol. 38 No. 8: AAAI-24 Technical Tracks 8 Proceedings of the AAAI Conference on Artificial Intelligence, 38(8), 8380-8388

  27. arXiv:2401.07271  [pdf, other

    cs.CV cs.AI

    SpineCLUE: Automatic Vertebrae Identification Using Contrastive Learning and Uncertainty Estimation

    Authors: Sheng Zhang, Minheng Chen, Junxian Wu, Ziyue Zhang, Tonglong Li, Cheng Xue, Youyong Kong

    Abstract: Vertebrae identification in arbitrary fields-of-view plays a crucial role in diagnosing spine disease. Most spine CT contain only local regions, such as the neck, chest, and abdomen. Therefore, identification should not depend on specific vertebrae or a particular number of vertebrae being visible. Existing methods at the spine-level are unable to meet this challenge. In this paper, we propose a t… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  28. arXiv:2312.14408  [pdf

    cs.CY

    Extended p-median problems for balancing service efficiency and equality

    Authors: Yunfeng Kong, Chenchen Lian, Guangli Zhang, Shiyan Zhai

    Abstract: This article deals with the location problem for balancing the service efficiency and equality. In public service systems, some people may feel envy in case that they need longer travel distance to access services than others. The strength of the envy can be measured by comparing one's travel distance to service facility with a threshold distance. Using the total envy function, four extended p-med… ▽ More

    Submitted 25 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 38 pages, 4 tables, 5 figures

    MSC Class: 90C27 ACM Class: J.6

  29. arXiv:2312.12142  [pdf, other

    cs.CV cs.AI

    FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

    Authors: Zhenhua Yang, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, Lianwen **

    Abstract: Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images. Although existing font generation methods have achieved satisfactory performance, they still struggle with complex characters and large style variations. To address these issues, we propose FontDiffuser, a diffusion-based ima… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024; Github Page: https://github.com/yeungchenwa/FontDiffuser

    Journal ref: 38th AAAI Conference on Artificial Intelligence (AAAI2024), Vancouver, BC, Canada, 2024

  30. arXiv:2312.07269  [pdf, other

    cs.GT

    Calibrating "Cheap Signals" in Peer Review without a Prior

    Authors: Yuxuan Lu, Yuqing Kong

    Abstract: Peer review lies at the core of the academic process, but even well-intentioned reviewers can still provide noisy ratings. While ranking papers by average ratings may reduce noise, varying noise levels and systematic biases stemming from ``cheap'' signals (e.g. author identity, proof length) can lead to unfairness. Detecting and correcting bias is challenging, as ratings are subjective and unverif… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  31. arXiv:2312.05602  [pdf, other

    cs.CV cs.AI

    EipFormer: Emphasizing Instance Positions in 3D Instance Segmentation

    Authors: Mengnan Zhao, Lihe Zhang, Yuqiu Kong, Baocai Yin

    Abstract: 3D instance segmentation plays a crucial role in comprehending 3D scenes. Despite recent advancements in this field, existing approaches exhibit certain limitations. These methods often rely on fixed instance positions obtained from sampled representative points in vast 3D point clouds, using center prediction or farthest point sampling. However, these selected positions may deviate from actual in… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  32. arXiv:2311.15203  [pdf, ps, other

    cs.GT

    Learning against Non-credible Auctions

    Authors: Qian Wang, Xuanzhi Xia, Zongjun Yang, Xiaotie Deng, Yuqing Kong, Zhilin Zhang, Liang Wang, Chuan Yu, Jian Xu, Bo Zheng

    Abstract: The standard framework of online bidding algorithm design assumes that the seller commits himself to faithfully implementing the rules of the adopted auction. However, the seller may attempt to cheat in execution to increase his revenue if the auction belongs to the class of non-credible auctions. For example, in a second-price auction, the seller could create a fake bid between the highest bid an… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  33. arXiv:2311.14094  [pdf, other

    cs.GT cs.LG

    Robust Decision Aggregation with Second-order Information

    Authors: Yuqi Pan, Zhaohua Chen, Yuqing Kong

    Abstract: We consider a decision aggregation problem with two experts who each make a binary recommendation after observing a private signal about an unknown binary world state. An agent, who does not know the joint information structure between signals and states, sees the experts' recommendations and aims to match the action with the true state. Under the scenario, we study whether supplemented additional… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  34. arXiv:2311.11473   

    cs.LG cs.AI

    CSGNN: Conquering Noisy Node labels via Dynamic Class-wise Selection

    Authors: Yifan Li, Zhen Tan, Kai Shu, Zongsheng Cao, Yu Kong, Huan Liu

    Abstract: Graph Neural Networks (GNNs) have emerged as a powerful tool for representation learning on graphs, but they often suffer from overfitting and label noise issues, especially when the data is scarce or imbalanced. Different from the paradigm of previous methods that rely on single-node confidence, in this paper, we introduce a novel Class-wise Selection for Graph Neural Networks, dubbed CSGNN, whic… ▽ More

    Submitted 14 December, 2023; v1 submitted 19 November, 2023; originally announced November 2023.

    Comments: For the privacy issue

  35. arXiv:2311.11315  [pdf, other

    cs.AI

    TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

    Authors: Yilun Kong, **gqing Ruan, Yihong Chen, Bin Zhang, Tianpeng Bao, Shiwei Shi, Guoqing Du, Xiaoru Hu, Hangyu Mao, Ziyue Li, Xingyu Zeng, Rui Zhao

    Abstract: Large Language Models (LLMs) have demonstrated proficiency in addressing tasks that necessitate a combination of task planning and the usage of external tools that require a blend of task planning and the utilization of external tools, such as APIs. However, real-world complex systems present three prevalent challenges concerning task planning and tool usage: (1) The real system usually has a vast… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

  36. arXiv:2310.16717  [pdf, other

    cs.CV

    Prompt-Driven Building Footprint Extraction in Aerial Images with Offset-Building Model

    Authors: Kai Li, Yupeng Deng, Yunlong Kong, Diyou Liu, **gbo Chen, Yu Meng, Junxian Ma

    Abstract: More accurate extraction of invisible building footprints from very-high-resolution (VHR) aerial images relies on roof segmentation and roof-to-footprint offset extraction. Existing state-of-the-art methods based on instance segmentation suffer from poor generalization when extended to large-scale data production and fail to achieve low-cost human interactive annotation. The latest prompt paradigm… ▽ More

    Submitted 11 March, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    ACM Class: I.4.6; I.4.7; I.3.5; I.5.1

  37. arXiv:2310.11857  [pdf, other

    cs.GT

    Multistable Perception, False Consensus, and Information Complements

    Authors: Yuqing Kong

    Abstract: This paper presents a distributed communication model to investigate multistable perception, where a stimulus gives rise to multiple competing perceptual interpretations. We formalize stable perception as consensus achieved through components exchanging information. Our key finding is that relationships between components influence monostable versus multistable perceptions. When components contain… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  38. arXiv:2310.09978   

    cs.CV cs.AI cs.LG

    Chinese Painting Style Transfer Using Deep Generative Models

    Authors: Weijian Ma, Yanyang Kong

    Abstract: Artistic style transfer aims to modify the style of the image while preserving its content. Style transfer using deep learning models has been widely studied since 2015, and most of the applications are focused on specific artists like Van Gogh, Monet, Cezanne. There are few researches and applications on traditional Chinese painting style transfer. In this paper, we will study and leverage differ… ▽ More

    Submitted 17 October, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

    Comments: Paper is too old (written in 2019)

  39. arXiv:2309.10711  [pdf, other

    cs.CV

    Latent Space Energy-based Model for Fine-grained Open Set Recognition

    Authors: Wentao Bao, Qi Yu, Yu Kong

    Abstract: Fine-grained open-set recognition (FineOSR) aims to recognize images belonging to classes with subtle appearance differences while rejecting images of unknown classes. A recent trend in OSR shows the benefit of generative models to discriminative unknown detection. As a type of generative model, energy-based models (EBM) are the potential for hybrid modeling of generative and discriminative tasks.… ▽ More

    Submitted 29 October, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Add ack

  40. arXiv:2309.09887  [pdf, other

    cs.CV cs.LG

    On Model Explanations with Transferable Neural Pathways

    Authors: Xinmiao Lin, Wentao Bao, Qi Yu, Yu Kong

    Abstract: Neural pathways as model explanations consist of a sparse set of neurons that provide the same level of prediction performance as the whole model. Existing methods primarily focus on accuracy and sparsity but the generated pathways may offer limited interpretability thus fall short in explaining the model behavior. In this paper, we suggest two interpretability criteria of neural pathways: (i) sam… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: Arxiv preprint

  41. arXiv:2309.02290  [pdf, other

    cs.CV

    ATM: Action Temporality Modeling for Video Question Answering

    Authors: Junwen Chen, Jie Zhu, Yu Kong

    Abstract: Despite significant progress in video question answering (VideoQA), existing methods fall short of questions that require causal/temporal reasoning across frames. This can be attributed to imprecise motion representations. We introduce Action Temporality Modeling (ATM) for temporality reasoning via three-fold uniqueness: (1) rethinking the optical flow and realizing that optical flow is effective… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  42. arXiv:2308.16413  [pdf, other

    cs.MM

    Edge-Assisted On-Device Model Update for Video Analytics in Adverse Environments

    Authors: Yuxin Kong, Peng Yang, Yan Cheng

    Abstract: While large deep neural networks excel at general video analytics tasks, the significant demand on computing capacity makes them infeasible for real-time inference on resource-constrained end cam-eras. In this paper, we propose an edge-assisted framework that continuously updates the lightweight model deployed on the end cameras to achieve accurate predictions in adverse environments. This framewo… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  43. arXiv:2308.14575  [pdf, other

    cs.CV

    Referring Image Segmentation Using Text Supervision

    Authors: Fang Liu, Yuhao Liu, Yuqiu Kong, Ke Xu, Lihe Zhang, Baocai Yin, Gerhard Hancke, Rynson Lau

    Abstract: Existing Referring Image Segmentation (RIS) methods typically require expensive pixel-level or box-level annotations for supervision. In this paper, we observe that the referring texts used in RIS already provide sufficient information to localize the target object. Hence, we propose a novel weakly-supervised RIS framework to formulate the target localization problem as a classification process to… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  44. arXiv:2308.12857  [pdf, other

    cs.LG

    Fast Adversarial Training with Smooth Convergence

    Authors: Mengnan Zhao, Lihe Zhang, Yuqiu Kong, Baocai Yin

    Abstract: Fast adversarial training (FAT) is beneficial for improving the adversarial robustness of neural networks. However, previous FAT work has encountered a significant issue known as catastrophic overfitting when dealing with large perturbation budgets, \ie the adversarial robustness of models declines to near zero during training. To address this, we analyze the training process of prior FAT work a… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Journal ref: ICCV2023

  45. arXiv:2307.10580  [pdf, other

    cs.LG physics.ao-ph

    Intelligent model for offshore China sea fog forecasting

    Authors: Yanfei Xiang, Qinghong Zhang, Mingqing Wang, Ruixue Xia, Yang Kong, Xiaomeng Huang

    Abstract: Accurate and timely prediction of sea fog is very important for effectively managing maritime and coastal economic activities. Given the intricate nature and inherent variability of sea fog, traditional numerical and statistical forecasting methods are often proven inadequate. This study aims to develop an advanced sea fog forecasting method embedded in a numerical weather prediction model using t… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 19 pages, 9 figures

  46. arXiv:2307.08243  [pdf, other

    cs.CV

    Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting

    Authors: Wentao Bao, Lele Chen, Libing Zeng, Zhong Li, Yi Xu, Junsong Yuan, Yu Kong

    Abstract: Hand trajectory forecasting from egocentric views is crucial for enabling a prompt understanding of human intentions when interacting with AR/VR systems. However, existing methods handle this problem in a 2D image space which is inadequate for 3D real-world applications. In this paper, we set up an egocentric 3D hand trajectory forecasting task that aims to predict hand trajectories in a 3D space… ▽ More

    Submitted 16 September, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: ICCV 2023 Accepted (Camera Ready)

  47. arXiv:2306.02558  [pdf, other

    cs.CV

    Multi-View Representation is What You Need for Point-Cloud Pre-Training

    Authors: Siming Yan, Chen Song, Youkang Kong, Qixing Huang

    Abstract: A promising direction for pre-training 3D point clouds is to leverage the massive amount of data in 2D, whereas the domain gap between 2D and 3D creates a fundamental challenge. This paper proposes a novel approach to point-cloud pre-training that learns 3D representations by leveraging pre-trained 2D networks. Different from the popular practice of predicting 2D features first and then obtaining… ▽ More

    Submitted 28 April, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

    Comments: Published in ICLR 2024

  48. arXiv:2305.14428  [pdf, other

    cs.CV

    Prompting Language-Informed Distribution for Compositional Zero-Shot Learning

    Authors: Wentao Bao, Lichang Chen, Heng Huang, Yu Kong

    Abstract: Compositional zero-shot learning (CZSL) task aims to recognize unseen compositional visual concepts, e.g., sliced tomatoes, where the model is learned only from the seen compositions, e.g., sliced potatoes and red tomatoes. Thanks to the prompt tuning on large pre-trained visual language models such as CLIP, recent literature shows impressively better CZSL performance than traditional vision-based… ▽ More

    Submitted 30 September, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  49. arXiv:2305.06252  [pdf, other

    cs.CV eess.IV

    Embedded Feature Similarity Optimization with Specific Parameter Initialization for 2D/3D Medical Image Registration

    Authors: Minheng Chen, Zhirun Zhang, Shuheng Gu, Youyong Kong

    Abstract: We present a novel deep learning-based framework: Embedded Feature Similarity Optimization with Specific Parameter Initialization (SOPI) for 2D/3D medical image registration which is a most challenging problem due to the difficulty such as dimensional mismatch, heavy computation load and lack of golden evaluation standard. The framework we design includes a parameter specification module to effici… ▽ More

    Submitted 19 December, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: 14 pages, 5 figures, accepted by ICASSP 2024

  50. arXiv:2305.02541  [pdf, other

    cs.CV cs.GT

    Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder

    Authors: Xinmiao Lin, Yikang Li, Jenhao Hsiao, Chiuman Ho, Yu Kong

    Abstract: The popular VQ-VAE models reconstruct images through learning a discrete codebook but suffer from a significant issue in the rapid quality degradation of image reconstruction as the compression rate rises. One major reason is that a higher compression rate induces more loss of visual signals on the higher frequency spectrum which reflect the details on pixel space. In this paper, a Frequency Compl… ▽ More

    Submitted 3 November, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted by CVPR 2023, code available at https://github.com/oppo-us-research/FA-VAE