Skip to main content

Showing 1–50 of 127 results for author: Peng, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17745  [pdf, ps, other

    cs.IR cs.LG

    Light-weight End-to-End Graph Interest Network for CTR Prediction in E-commerce Search

    Authors: Pai Peng, Quanxiang Jia, Ziqiang Zhou, Shuang Hong, Zichong Xiao

    Abstract: Click-through-rate (CTR) prediction has an essential impact on improving user experience and revenue in e-commerce search. With the development of deep learning, graph-based methods are well exploited to utilize graph structure extracted from user behaviors and other information to help embedding learning. However, most of the previous graph-based methods mainly focus on recommendation scenarios,… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages, 4 figures

    ACM Class: H.3.3

  2. arXiv:2406.09272  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

    Authors: Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman

    Abstract: Generating realistic audio for human interactions is important for many applications, such as creating sound effects for films or virtual reality games. Existing approaches implicitly assume total correspondence between the video and audio during training, yet many sounds happen off-screen and have weak to no correspondence with the visuals -- resulting in uncontrolled ambient sounds or hallucinat… ▽ More

    Submitted 20 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://vision.cs.utexas.edu/projects/action2sound

  3. arXiv:2406.04070  [pdf, other

    cs.LG cs.AI

    Batch-in-Batch: a new adversarial training framework for initial perturbation and sample selection

    Authors: Yinting Wu, Pai Peng, Bo Cai, Le Li, .

    Abstract: Adversarial training methods commonly generate independent initial perturbation for adversarial samples from a simple uniform distribution, and obtain the training batch for the classifier without selection. In this work, we propose a simple yet effective training framework called Batch-in-Batch (BB) to enhance models robustness. It involves specifically a joint construction of initial values that… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 29 pages, 11 figures

  4. arXiv:2405.09291  [pdf, other

    cs.CV cs.AI eess.IV

    Sensitivity Decouple Learning for Image Compression Artifacts Reduction

    Authors: Li Ma, Yifan Zhao, Peixi Peng, Yonghong Tian

    Abstract: With the benefit of deep learning techniques, recent researches have made significant progress in image compression artifacts reduction. Despite their improved performances, prevailing methods only focus on learning a map** from the compressed image to the original one but ignore the intrinsic attributes of the given compressed images, which greatly harms the performance of downstream parsing ta… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted by Transactions on Image Processing

  5. arXiv:2405.08593  [pdf, other

    cs.CV

    Open-Vocabulary Object Detection via Neighboring Region Attention Alignment

    Authors: Sunyuan Qiang, Xianfei Li, Yanyan Liang, Wenlong Liao, Tao He, Pai Peng

    Abstract: The nature of diversity in real-world environments necessitates neural network models to expand from closed category settings to accommodate novel emerging categories. In this paper, we study the open-vocabulary object detection (OVD), which facilitates the detection of novel object classes under the supervision of only base annotations and open-vocabulary knowledge. However, we find that the inad… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  6. arXiv:2404.16464  [pdf, other

    cs.SI

    Sublinear-Time Opinion Estimation in the Friedkin--Johnsen Model

    Authors: Stefan Neumann, Yinhao Dong, Pan Peng

    Abstract: Online social networks are ubiquitous parts of modern societies and the discussions that take place in these networks impact people's opinions on diverse topics, such as politics or vaccination. One of the most popular models to formally describe this opinion formation process is the Friedkin--Johnsen (FJ) model, which allows to define measures, such as the polarization and the disagreement of a n… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear at the 2024 ACM Web Conference

  7. arXiv:2404.06287  [pdf, other

    cs.CV cs.LG

    Counterfactual Reasoning for Multi-Label Image Classification via Patching-Based Training

    Authors: Ming-Kun Xie, Jia-Hao Xiao, Pei Peng, Gang Niu, Masashi Sugiyama, Sheng-Jun Huang

    Abstract: The key to multi-label image classification (MLC) is to improve model performance by leveraging label correlations. Unfortunately, it has been shown that overemphasizing co-occurrence relationships can cause the overfitting issue of the model, ultimately leading to performance degradation. In this paper, we provide a causal inference framework to show that the correlative features caused by the ta… ▽ More

    Submitted 12 June, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  8. arXiv:2404.00886  [pdf, other

    cs.AI

    MTLight: Efficient Multi-Task Reinforcement Learning for Traffic Signal Control

    Authors: Liwen Zhu, Peixi Peng, Zongqing Lu, Yonghong Tian

    Abstract: Traffic signal control has a great impact on alleviating traffic congestion in modern cities. Deep reinforcement learning (RL) has been widely used for this task in recent years, demonstrating promising performance but also facing many challenges such as limited performances and sample inefficiency. To handle these challenges, MTLight is proposed to enhance the agent observation with a latent stat… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  9. arXiv:2403.16973  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

    Authors: Puyuan Peng, Po-Yao Huang, Shang-Wen Li, Abdelrahman Mohamed, David Harwath

    Abstract: We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on audiobooks, internet videos, and podcasts. VoiceCraft employs a Transformer decoder architecture and introduces a token rearrangement procedure that combines causal masking and delayed stacking to enable generation within an… ▽ More

    Submitted 13 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: ACL 2024. Data, code, and model weights are available at https://github.com/jasonppy/VoiceCraft

  10. arXiv:2403.14332  [pdf, ps, other

    cs.DS cs.CR cs.LG

    A Differentially Private Clustering Algorithm for Well-Clustered Graphs

    Authors: Weiqiang He, Hendrik Fichtenberger, Pan Peng

    Abstract: We study differentially private (DP) algorithms for recovering clusters in well-clustered graphs, which are graphs whose vertex set can be partitioned into a small number of sets, each inducing a subgraph of high inner conductance and small outer conductance. Such graphs have widespread application as a benchmark in the theoretical analysis of spectral clustering. We provide an efficient ($ε$,$δ$)… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  11. arXiv:2403.04162  [pdf, other

    cs.LG cs.NE

    Noisy Spiking Actor Network for Exploration

    Authors: Ding Chen, Peixi Peng, Tiejun Huang, Yonghong Tian

    Abstract: As a general method for exploration in deep reinforcement learning (RL), NoisyNet can produce problem-specific exploration strategies. Spiking neural networks (SNNs), due to their binary firing mechanism, have strong robustness to noise, making it difficult to realize efficient exploration with local disturbances. To solve this exploration problem, we propose a noisy spiking actor network (NoisySA… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 13 pages, 6 figures

  12. arXiv:2403.03382  [pdf, other

    cs.AI

    Adaptive Discovering and Merging for Incremental Novel Class Discovery

    Authors: Guangyao Chen, Peixi Peng, Yangru Huang, Mengyue Geng, Yonghong Tian

    Abstract: One important desideratum of lifelong learning aims to discover novel classes from unlabelled data in a continuous manner. The central challenge is twofold: discovering and learning novel classes while mitigating the issue of catastrophic forgetting of established knowledge. To this end, we introduce a new paradigm called Adaptive Discovering and Merging (ADM) to discover novel categories adaptive… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: AAAI 2024. arXiv admin note: text overlap with arXiv:2207.08605 by other authors

  13. arXiv:2402.06959  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

    Authors: Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-yi Lee, Hsin-Min Wang, David Harwath

    Abstract: The recently proposed visually grounded speech model SpeechCLIP is an innovative framework that bridges speech and text through images via CLIP without relying on text transcription. On this basis, this paper introduces two extensions to SpeechCLIP. First, we apply the Continuous Integrate-and-Fire (CIF) module to replace a fixed number of CLS tokens in the cascaded architecture. Second, we propos… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024, Self-supervision in Audio, Speech, and Beyond (SASB) workshop

  14. arXiv:2402.05819  [pdf, other

    eess.AS cs.CL cs.LG

    Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model

    Authors: Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-yi Lee, David Harwath

    Abstract: Recent advances in self-supervised speech models have shown significant improvement in many downstream tasks. However, these models predominantly centered on frame-level training objectives, which can fall short in spoken language understanding tasks that require semantic comprehension. Existing works often rely on additional speech-text data as intermediate targets, which is costly in the real-wo… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024 workshop on Self-supervision in Audio, Speech, and Beyond (SASB)

  15. arXiv:2402.01591  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    BAT: Learning to Reason about Spatial Sounds with Large Language Models

    Authors: Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath

    Abstract: Spatial sound reasoning is a fundamental human skill, enabling us to navigate and interpret our surroundings based on sound. In this paper we present BAT, which combines the spatial sound perception ability of a binaural acoustic scene analysis model with the natural language reasoning capabilities of a large language model (LLM) to replicate this innate ability. To address the lack of existing da… ▽ More

    Submitted 25 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024. Our demo, dataset, code and model weights are available at: https://zhishengzheng.com/BAT

  16. arXiv:2401.05444  [pdf, other

    cs.NE cs.AI cs.LG

    Fully Spiking Actor Network with Intra-layer Connections for Reinforcement Learning

    Authors: Ding Chen, Peixi Peng, Tiejun Huang, Yonghong Tian

    Abstract: With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (DRL). In this paper, we focus on the task where the agent needs to learn multi-dimensional deterministic policies… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 13 pages, 6 figures

  17. arXiv:2312.06988  [pdf, other

    cs.CV

    MWSIS: Multimodal Weakly Supervised Instance Segmentation with 2D Box Annotations for Autonomous Driving

    Authors: Guangfeng Jiang, Jun Liu, Yuzhi Wu, Wenlong Liao, Tao He, Pai Peng

    Abstract: Instance segmentation is a fundamental research in computer vision, especially in autonomous driving. However, manual mask annotation for instance segmentation is quite time-consuming and costly. To address this problem, some prior works attempt to apply weakly supervised manner by exploring 2D or 3D boxes. However, no one has ever successfully segmented 2D and 3D instances simultaneously by only… ▽ More

    Submitted 17 December, 2023; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: AAAI2024

  18. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, **g Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, **g Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: updated baseline results and dataset statistics to match the released v2 data; added table to appendix comparing stats of Ego-Exo4D alongside other datasets

  19. arXiv:2310.17878  [pdf, other

    cs.DS cs.LG cs.SI

    A Sublinear-Time Spectral Clustering Oracle with Improved Preprocessing Time

    Authors: Ranran Shen, Pan Peng

    Abstract: We address the problem of designing a sublinear-time spectral clustering oracle for graphs that exhibit strong clusterability. Such graphs contain $k$ latent clusters, each characterized by a large inner conductance (at least $\varphi$) and a small outer conductance (at most $\varepsilon$). Our aim is to preprocess the graph to enable clustering membership queries, with the key requirement that bo… ▽ More

    Submitted 29 December, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: To appear at NeurIPS'23

  20. arXiv:2310.11305  [pdf, other

    cs.AI cs.LG

    MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games

    Authors: Ti-Rong Wu, Hung Guei, Pei-Chiun Peng, Po-Wei Huang, Ting Han Wei, Chung-Chin Shih, Yun-Jui Tsai

    Abstract: This paper presents MiniZero, a zero-knowledge learning framework that supports four state-of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero. While these algorithms have demonstrated super-human performance in many games, it remains unclear which among them is most suitable or efficient for specific tasks. Through MiniZero, we systematically evaluate the perfo… ▽ More

    Submitted 26 April, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted by IEEE Transactions on Games

  21. arXiv:2310.07654  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Audio-Visual Neural Syntax Acquisition

    Authors: Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

    Abstract: We study phrase structure induction from visually-grounded speech. The core idea is to first segment the speech waveform into sequences of word segments, and subsequently induce phrase structure using the inferred segment-level continuous representations. We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without eve… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  22. arXiv:2309.10787  [pdf, other

    eess.AS cs.CV cs.MM cs.SD

    AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

    Authors: Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee

    Abstract: Audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information. However, current models often focus on a limited set of tasks, and generalization abilities of learned representations are unclear. To this end, we propose the AV-SUPERB benchmark that enables general-purpose evaluation of unimodal audio/visual a… ▽ More

    Submitted 19 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024; Evaluation Code: https://github.com/roger-tseng/av-superb Submission Platform: https://av.superbbenchmark.org

  23. arXiv:2307.07389  [pdf, other

    cs.LG

    Learning Sparse Neural Networks with Identity Layers

    Authors: Mingjian Ni, Guangyao Chen, Xiawu Zheng, Peixi Peng, Li Yuan, Yonghong Tian

    Abstract: The sparsity of Deep Neural Networks is well investigated to maximize the performance and reduce the size of overparameterized networks as possible. Existing methods focus on pruning parameters in the training process by using thresholds and metrics. Meanwhile, feature similarity between different layers has not been discussed sufficiently before, which could be rigorously proved to be highly corr… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

  24. arXiv:2307.01218  [pdf, ps, other

    cs.DS

    Effective Resistances in Non-Expander Graphs

    Authors: Dongrun Cai, Xue Chen, Pan Peng

    Abstract: Effective resistances are ubiquitous in graph algorithms and network analysis. In this work, we study sublinear time algorithms to approximate the effective resistance of an adjacent pair $s$ and $t$. We consider the classical adjacency list model for local algorithms. While recent works have provided sublinear time algorithms for expander graphs, we prove several lower bounds for general graphs o… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

  25. arXiv:2307.00530  [pdf, other

    cs.DS cs.DC

    Massively Parallel Algorithms for the Stochastic Block Model

    Authors: Zelin Li, Pan Peng, Xianbin Zhu

    Abstract: Learning the community structure of a large-scale graph is a fundamental problem in machine learning, computer science and statistics. We study the problem of exactly recovering the communities in a graph generated from the Stochastic Block Model (SBM) in the Massively Parallel Computation (MPC) model. Specifically, given $kn$ vertices that are partitioned into $k$ equal-sized clusters (i.e., each… ▽ More

    Submitted 14 August, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

    MSC Class: 68W15

  26. arXiv:2306.15644  [pdf, other

    cs.CL

    Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos

    Authors: Chiori Hori, Puyuan Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh Jha, Diego Romeres, Jonathan Le Roux

    Abstract: To realize human-robot collaboration, robots need to execute actions for new tasks according to human instructions given finite prior knowledge. Human experts can share their knowledge of how to perform a task with a robot through multi-modal instructions in their demonstrations, showing a sequence of short-horizon steps to achieve a long-horizon goal. This paper introduces a method for robot acti… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: Accepted to Interspeech2023

  27. arXiv:2306.15266  [pdf, other

    cs.AI

    Internal Contrastive Learning for Generalized Out-of-distribution Fault Diagnosis (GOOFD) Framework

    Authors: Xingyue Wang, Hanrong Zhang, Ke Ma, Shuting Tao, Peng Peng, Hongwei Wang

    Abstract: Fault diagnosis is essential in industrial processes for monitoring the conditions of important machines. With the ever-increasing complexity of working conditions and demand for safety during production and operation, different diagnosis methods are required, and more importantly, an integrated fault diagnosis system that can cope with multiple tasks is highly desired. However, the diagnosis subt… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

  28. arXiv:2306.14701  [pdf, other

    cs.LG cs.AI

    Hard Sample Mining Enabled Supervised Contrastive Feature Learning for Wind Turbine Pitch System Fault Diagnosis

    Authors: Zixuan Wang, Bo Qin, Mengxuan Li, Chenlu Zhan, Mark D. Butala, Peng Peng, Hongwei Wang

    Abstract: The efficient utilization of wind power by wind turbines relies on the ability of their pitch systems to adjust blade pitch angles in response to varying wind speeds. However, the presence of multiple health conditions in the pitch system due to the long-term wear and tear poses challenges in accurately classifying them, thus increasing the maintenance cost of wind turbines or even damaging them.… ▽ More

    Submitted 10 August, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

  29. arXiv:2306.05236  [pdf, other

    cs.CV

    Population-Based Evolutionary Gaming for Unsupervised Person Re-identification

    Authors: Yunpeng Zhai, Peixi Peng, Mengxi Jia, Shiyong Li, Weiqiang Chen, Xuesong Gao, Yonghong Tian

    Abstract: Unsupervised person re-identification has achieved great success through the self-improvement of individual neural networks. However, limited by the lack of diversity of discriminant information, a single network has difficulty learning sufficient discrimination ability by itself under unsupervised conditions. To address this limit, we develop a population-based evolutionary gaming (PEG) framework… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted in IJCV

  30. arXiv:2305.13089  [pdf, ps, other

    cs.DS

    An Optimal Separation between Two Property Testing Models for Bounded Degree Directed Graphs

    Authors: Pan Peng, Yuyang Wang

    Abstract: We revisit the relation between two fundamental property testing models for bounded-degree directed graphs: the bidirectional model in which the algorithms are allowed to query both the outgoing edges and incoming edges of a vertex, and the unidirectional model in which only queries to the outgoing edges are allowed. Czumaj, Peng and Sohler [STOC 2016] showed that for directed graphs with both max… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: To appear in ICALP 2023

  31. arXiv:2305.11435  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model

    Authors: Puyuan Peng, Shang-Wen Li, Okko Räsänen, Abdelrahman Mohamed, David Harwath

    Abstract: In this paper, we show that representations capturing syllabic units emerge when training a self-supervised speech model with a visually-grounded training objective. We demonstrate that a nearly identical model architecture (HuBERT) trained with a masked language modeling loss does not exhibit this same ability, suggesting that the visual grounding objective is responsible for the emergence of thi… ▽ More

    Submitted 23 July, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023. Code & Model: https://github.com/jasonppy/syllable-discovery

  32. arXiv:2305.11095  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization

    Authors: Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath

    Abstract: We investigate the emergent abilities of the recently proposed web-scale speech model Whisper, by adapting it to unseen tasks with prompt engineering. We selected three tasks: audio-visual speech recognition (AVSR), code-switched speech recognition (CS-ASR), and speech translation (ST) on unseen language pairs. We design task-specific prompts, by either leveraging another large-scale model, or sim… ▽ More

    Submitted 15 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023

  33. Picking Up Quantization Steps for Compressed Image Classification

    Authors: Li Ma, Peixi Peng, Guangyao Chen, Yifan Zhao, Siwei Dong, Yonghong Tian

    Abstract: The sensitivity of deep neural networks to compressed images hinders their usage in many real applications, which means classification networks may fail just after taking a screenshot and saving it as a compressed file. In this paper, we argue that neglected disposable coding parameters stored in compressed files could be picked up to reduce the sensitivity of deep neural networks to compressed im… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Journal ref: in IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 4, pp. 1884-1898, April 2023

  34. arXiv:2304.03810  [pdf, ps, other

    cs.LO cs.CC cs.DM cs.DS math.CO

    On Testability of First-Order Properties in Bounded-Degree Graphs and Connections to Proximity-Oblivious Testing

    Authors: Isolde Adler, Noleen Köhler, Pan Peng

    Abstract: We study property testing of properties that are definable in first-order logic (FO) in the bounded-degree graph and relational structure models. We show that any FO property that is defined by a formula with quantifier prefix $\exists^*\forall^*$ is testable (i.e., testable with constant query complexity), while there exists an FO property that is expressible by a formula with quantifier prefix… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: Preliminary version of this article appeared in SODA'21 (arXiv:2008.05800) and CCC'21 (arXiv:2105.08490)

  35. arXiv:2304.00426  [pdf, other

    cs.CV

    Learning with Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning

    Authors: Zeyin Song, Yifan Zhao, Yujun Shi, Peixi Peng, Li Yuan, Yonghong Tian

    Abstract: Few-shot class-incremental learning (FSCIL) aims at learning to classify new classes continually from limited samples without forgetting the old classes. The mainstream framework tackling FSCIL is first to adopt the cross-entropy (CE) loss for training at the base session, then freeze the feature extractor to adapt to new classes. However, in this work, we find that the CE loss is not ideal for th… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR 2023

  36. arXiv:2303.00486  [pdf, other

    cs.DC cs.IT

    Redundancy Management for Fast Service (Rates) in Edge Computing Systems

    Authors: Pei Peng, Emina Soljanin

    Abstract: Edge computing operates between the cloud and end-users and strives to provide fast computing services for multiple users. Because of their proximity to users, edge services have a low communication delay and can provide low latency with sufficient computing and storage resources. However, edge computing and storage resources are limited. Thus, directing more resources to some computing jobs will… ▽ More

    Submitted 5 December, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: This paper is submitted to IEEE/ACM Transactions on Networking

  37. An Order-Invariant and Interpretable Hierarchical Dilated Convolution Neural Network for Chemical Fault Detection and Diagnosis

    Authors: Mengxuan Li, Peng Peng, Min Wang, Hongwei Wang

    Abstract: Fault detection and diagnosis is significant for reducing maintenance costs and improving health and safety in chemical processes. Convolution neural network (CNN) is a popular deep learning algorithm with many successful applications in chemical fault detection and diagnosis tasks. However, convolution layers in CNN are very sensitive to the order of features, which can lead to instability in the… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  38. arXiv:2302.05929  [pdf, other

    cs.LG

    SCLIFD:Supervised Contrastive Knowledge Distillation for Incremental Fault Diagnosis under Limited Fault Data

    Authors: Peng Peng, Hanrong Zhang, Mengxuan Li, Gongzhuang Peng, Hongwei Wang, Weiming Shen

    Abstract: Intelligent fault diagnosis has made extraordinary advancements currently. Nonetheless, few works tackle class-incremental learning for fault diagnosis under limited fault data, i.e., imbalanced and long-tailed fault diagnosis, which brings about various notable challenges. Initially, it is difficult to extract discriminative features from limited fault data. Moreover, a well-trained model must be… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

  39. SCCAM: Supervised Contrastive Convolutional Attention Mechanism for Ante-hoc Interpretable Fault Diagnosis with Limited Fault Samples

    Authors: Mengxuan Li, Peng Peng, **gxin Zhang, Hongwei Wang, Weiming Shen

    Abstract: In real industrial processes, fault diagnosis methods are required to learn from limited fault samples since the procedures are mainly under normal conditions and the faults rarely occur. Although attention mechanisms have become popular in the field of fault diagnosis, the existing attention-based methods are still unsatisfying for the above practical applications. First, pure attention-based arc… ▽ More

    Submitted 17 February, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

  40. arXiv:2301.11929  [pdf, other

    cs.NE cs.AI

    Training Full Spike Neural Networks via Auxiliary Accumulation Pathway

    Authors: Guangyao Chen, Peixi Peng, Guoqi Li, Yonghong Tian

    Abstract: Due to the binary spike signals making converting the traditional high-power multiply-accumulation (MAC) into a low-power accumulation (AC) available, the brain-inspired Spiking Neural Networks (SNNs) are gaining more and more attention. However, the binary spike propagation of the Full-Spike Neural Networks (FSNN) with limited time steps is prone to significant information loss. To improve perfor… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  41. FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced Context-Aware Network

    Authors: Huafeng Liu, Pai Peng, Tao Chen, Qiong Wang, Yazhou Yao, Xian-Sheng Hua

    Abstract: Few-shot semantic segmentation is the task of learning to locate each pixel of the novel class in the query image with only a few annotated support images. The current correlation-based methods construct pair-wise feature correlations to establish the many-to-many matching because the typical prototype-based approaches cannot learn fine-grained correspondence relations. However, the existing metho… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

    Comments: accepted by IEEE Transactions on Multimedia

  42. arXiv:2212.10729  [pdf, other

    cs.CV cs.AI cs.LG

    UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering

    Authors: Chenlu Zhan, Peng Peng, Hongsen Wang, Tao Chen, Hongwei Wang

    Abstract: Medical Visual Question Answering (Medical-VQA) aims to to answer clinical questions regarding radiology images, assisting doctors with decision-making options. Nevertheless, current Medical-VQA models learn cross-modal representations through residing vision and texture encoders in dual separate spaces, which lead to indirect semantic alignment. In this paper, we propose UnICLAM, a Unified and In… ▽ More

    Submitted 27 September, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  43. arXiv:2212.09039  [pdf, other

    cs.CV

    Automated Optical Inspection of FAST's Reflector Surface using Drones and Computer Vision

    Authors: Jianan Li, Shenwang Jiang, Liqiang Song, Peiran Peng, Feng Mu, Hui Li, Peng Jiang, Tingfa Xu

    Abstract: The Five-hundred-meter Aperture Spherical radio Telescope (FAST) is the world's largest single-dish radio telescope. Its large reflecting surface achieves unprecedented sensitivity but is prone to damage, such as dents and holes, caused by naturally-occurring falling objects. Hence, the timely and accurate detection of surface defects is crucial for FAST's stable operation. Conventional manual ins… ▽ More

    Submitted 18 December, 2022; originally announced December 2022.

  44. arXiv:2211.02178  [pdf, other

    cs.CV cs.CL

    Zero-shot Video Moment Retrieval With Off-the-Shelf Models

    Authors: Anuj Diwan, Puyuan Peng, Raymond J. Mooney

    Abstract: For the majority of the machine learning community, the expensive nature of collecting high-quality human-annotated data and the inability to efficiently finetune very large state-of-the-art pretrained models on limited compute are major bottlenecks for building models for new tasks. We propose a zero-shot simple approach for one such task, Video Moment Retrieval (VMR), that does not perform any a… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: Accepted to the NeurIPS 2022 Workshop on Transfer Learning for NLP (TL4NLP). 12 pages, 5 figures

  45. arXiv:2210.12601  [pdf, ps, other

    cs.DS

    Sublinear-Time Algorithms for Max Cut, Max E2Lin$(q)$, and Unique Label Cover on Expanders

    Authors: Pan Peng, Yuichi Yoshida

    Abstract: We show sublinear-time algorithms for Max Cut and Max E2Lin$(q)$ on expanders in the adjacency list model that distinguishes instances with the optimal value more than $1-\varepsilon$ from those with the optimal value less than $1-ρ$ for $ρ\gg \varepsilon$. The time complexities for Max Cut and Max $2$Lin$(q)$ are $\widetilde{O}(\frac{1}{φ^2ρ} \cdot m^{1/2+O(\varepsilon/(φ^2ρ))})$ and… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: To appear in SODA'23

  46. arXiv:2210.10824  [pdf, other

    cs.LG

    Supervised Contrastive Learning with Tree-Structured Parzen Estimator Bayesian Optimization for Imbalanced Tabular Data

    Authors: Shuting Tao, Peng Peng, Qi Li, Hongwei Wang

    Abstract: Class imbalance has a detrimental effect on the predictive performance of most supervised learning algorithms as the imbalanced distribution can lead to a bias preferring the majority class. To solve this problem, we propose a Supervised Contrastive Learning (SCL) method with Tree-structured Parzen Estimator (TPE) technique for imbalanced tabular datasets. Contrastive learning (CL) can extract the… ▽ More

    Submitted 26 October, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: 28 pages, 6 figures

  47. arXiv:2210.02081  [pdf, other

    cs.CV

    Locate before Answering: Answer Guided Question Localization for Video Question Answering

    Authors: Tianwen Qian, Ran Cui, **g**g Chen, Pai Peng, Xiaowei Guo, Yu-Gang Jiang

    Abstract: Video question answering (VideoQA) is an essential task in vision-language understanding, which has attracted numerous research attention recently. Nevertheless, existing works mostly achieve promising performances on short videos of duration within 15 seconds. For VideoQA on minute-level long-term videos, those methods are likely to fail because of lacking the ability to deal with noise and redun… ▽ More

    Submitted 12 October, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

  48. arXiv:2209.09768  [pdf, other

    cs.CL

    An Efficient End-to-End Transformer with Progressive Tri-modal Attention for Multi-modal Emotion Recognition

    Authors: Yang Wu, Pai Peng, Zhenyu Zhang, Yanyan Zhao, Bing Qin

    Abstract: Recent works on multi-modal emotion recognition move towards end-to-end models, which can extract the task-specific features supervised by the target task compared with the two-phase pipeline. However, previous methods only model the feature interactions between the textual and either acoustic and visual modalities, ignoring capturing the feature interactions between the acoustic and visual modali… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

  49. arXiv:2206.13813  [pdf, other

    cs.DS cs.LG cs.SI

    Sublinear-Time Clustering Oracle for Signed Graphs

    Authors: Stefan Neumann, Pan Peng

    Abstract: Social networks are often modeled using signed graphs, where vertices correspond to users and edges have a sign that indicates whether an interaction between users was positive or negative. The arising signed graphs typically contain a clear community structure in the sense that the graph can be partitioned into a small number of polarized communities, each defining a sparse cut and indivisible in… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: To appear at ICML'22

  50. arXiv:2206.11114  [pdf, other

    cs.AI cs.GT

    Evolutionary Game-Theoretical Analysis for General Multiplayer Asymmetric Games

    Authors: Xinyu Zhang, Peng Peng, Yushan Zhou, Haifeng Wang, Wenxin Li

    Abstract: Evolutionary game theory has been a successful tool to combine classical game theory with learning-dynamical descriptions in multiagent systems. Provided some symmetric structures of interacting players, many studies have been focused on using a simplified heuristic payoff table as input to analyse the dynamics of interactions. Nevertheless, even for the state-of-the-art method, there are two limi… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

    Comments: 10 pages, 5 figures