Skip to main content

Showing 1–50 of 77 results for author: Mao, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00625  [pdf, other

    cs.CV

    SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection

    Authors: Yun Peng, Xiao Lin, Nachuan Ma, Jiayuan Du, Chuangwei Liu, Chengju Liu, Qijun Chen

    Abstract: Visual anomaly detection is vital in real-world applications, such as industrial defect detection and medical diagnosis. However, most existing methods focus on local structural anomalies and fail to detect higher-level functional anomalies under logical conditions. Although recent studies have explored logical anomaly detection, they can only address simple anomalies like missing or addition and… ▽ More

    Submitted 5 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  2. arXiv:2405.18003  [pdf, other

    cs.CV cs.AI

    MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling

    Authors: Bowen Zhang, Xiaofei Xie, Haotian Lu, Na Ma, Tianlin Li, Qing Guo

    Abstract: Diffusion-based video generation has achieved significant progress, yet generating multiple actions that occur sequentially remains a formidable task. Directly generating a video with sequential actions can be extremely challenging due to the scarcity of fine-grained action annotations and the difficulty in establishing temporal semantic correspondences and maintaining long-term consistency. To ta… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2405.16383  [pdf, other

    cs.LG

    Rewarded Region Replay (R3) for Policy Learning with Discrete Action Space

    Authors: Bangzheng Li, Ningshan Ma, Zifan Wang

    Abstract: We introduce a new on-policy algorithm called Rewarded Region Replay (R3), which significantly improves on PPO in solving environments with discrete action spaces. R3 improves sample efficiency by using a replay buffer which contains past successful trajectories with reward above a certain threshold, which are used to update a PPO agent with importance sampling. Crucially, we discard the importanc… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    ACM Class: I.2.6

  4. arXiv:2405.06959  [pdf, other

    cs.RO

    AHPPEBot: Autonomous Robot for Tomato Harvesting based on Phenoty** and Pose Estimation

    Authors: Xingxu Li, Nan Ma, Yiheng Han, Shun Yang, Siyi Zheng

    Abstract: To address the limitations inherent to conventional automated harvesting robots specifically their suboptimal success rates and risk of crop damage, we design a novel bot named AHPPEBot which is capable of autonomous harvesting based on crop phenoty** and pose estimation. Specifically, In phenoty**, the detection, association, and maturity estimation of tomato trusses and individual fruits are… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted by 2024 IEEE International Conference on Robotics and Automation (ICRA),7 pages, 3 figures

  5. arXiv:2403.18058  [pdf, other

    cs.CL cs.AI

    COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning

    Authors: Yuelin Bai, Xinrun Du, Yiming Liang, Yonggang **, Ziqiang Liu, Junting Zhou, Tianyu Zheng, Xincheng Zhang, Nuo Ma, Zekun Wang, Ruibin Yuan, Haihong Wu, Hongquan Lin, Wenhao Huang, Jiajun Zhang, Wenhu Chen, Chenghua Lin, Jie Fu, Min Yang, Shiwen Ni, Ge Zhang

    Abstract: Recently, there have been significant advancements in large language models (LLMs), particularly focused on the English language. These advancements have enabled these LLMs to understand and execute complex instructions with unprecedented accuracy and fluency. However, despite these advancements, there remains a noticeable gap in the development of Chinese instruction tuning. The unique linguistic… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  6. arXiv:2402.17664  [pdf, other

    cs.CV

    Bayesian Differentiable Physics for Cloth Digitalization

    Authors: Deshan Gong, Ningtao Mao, He Wang

    Abstract: We propose a new method for cloth digitalization. Deviating from existing methods which learn from data captured under relatively casual settings, we propose to learn from data captured in strictly tested measuring protocols, and find plausible physical parameters of the cloths. However, such data is currently absent, so we first propose a new dataset with accurate cloth measurements. Further, the… ▽ More

    Submitted 11 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 9 pages, 8 figures, to be published in CVPR

    ACM Class: F.4.8; I.6.8

  7. arXiv:2402.13607  [pdf, other

    cs.CV cs.CL

    CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models

    Authors: Fuwen Luo, Chi Chen, Zihao Wan, Zhaolu Kang, Qidong Yan, Yingjie Li, Xiaolong Wang, Siyu Wang, Ziyue Wang, Xiaoyue Mi, Peng Li, Ning Ma, Maosong Sun, Yang Liu

    Abstract: Multimodal large language models (MLLMs) have demonstrated promising results in a variety of tasks that combine vision and language. As these models become more integral to research and applications, conducting comprehensive evaluations of their capabilities has grown increasingly important. However, most existing benchmarks fail to consider that, in certain situations, images need to be interpret… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  8. arXiv:2402.04883  [pdf, other

    cs.CV

    Toward Accurate Camera-based 3D Object Detection via Cascade Depth Estimation and Calibration

    Authors: Chaoqun Wang, Yiran Qin, Zijian Kang, Ningning Ma, Ruimao Zhang

    Abstract: Recent camera-based 3D object detection is limited by the precision of transforming from image to 3D feature spaces, as well as the accuracy of object localization within the 3D space. This paper aims to address such a fundamental problem of camera-based 3D object detection: How to effectively learn depth information for accurate feature lifting and object localization. Different from previous met… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted to ICRA2024

  9. arXiv:2402.02950  [pdf, other

    cs.CR eess.SP

    Semantic Entropy Can Simultaneously Benefit Transmission Efficiency and Channel Security of Wireless Semantic Communications

    Authors: Yankai Rong, Guoshun Nan, Minwei Zhang, Sihan Chen, Songtao Wang, Xuefei Zhang, Nan Ma, Shixun Gong, Zhaohui Yang, Qimei Cui, Xiaofeng Tao, Tony Q. S. Quek

    Abstract: Recently proliferated deep learning-based semantic communications (DLSC) focus on how transmitted symbols efficiently convey a desired meaning to the destination. However, the sensitivity of neural models and the openness of wireless channels cause the DLSC system to be extremely fragile to various malicious attacks. This inspires us to ask a question: "Can we further exploit the advantages of tra… ▽ More

    Submitted 6 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 13 pages, 12 figures

  10. arXiv:2402.01269  [pdf, other

    cs.CV

    Spectrum-guided Feature Enhancement Network for Event Person Re-Identification

    Authors: Hongchen Tan, Yi Zhang, ** Liu, Baocai Yin, Nan Ma, Xin Li, Huchuan Lu

    Abstract: As a cutting-edge biosensor, the event camera holds significant potential in the field of computer vision, particularly regarding privacy preservation. However, compared to traditional cameras, event streams often contain noise and possess extremely sparse semantics, posing a formidable challenge for event-based person re-identification (event Re-ID). To address this, we introduce a novel event pe… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  11. arXiv:2401.15647  [pdf, other

    cs.CV cs.AI eess.IV

    UP-CrackNet: Unsupervised Pixel-Wise Road Crack Detection via Adversarial Image Restoration

    Authors: Nachuan Ma, Rui Fan, Lihua Xie

    Abstract: Over the past decade, automated methods have been developed to detect cracks more efficiently, accurately, and objectively, with the ultimate goal of replacing conventional manual visual inspection techniques. Among these methods, semantic segmentation algorithms have demonstrated promising results in pixel-wise crack detection tasks. However, training such networks requires a large amount of huma… ▽ More

    Submitted 6 May, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  12. arXiv:2401.08740  [pdf, other

    cs.CV cs.LG

    SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

    Authors: Nanye Ma, Mark Goldstein, Michael S. Albergo, Nicholas M. Boffi, Eric Vanden-Eijnden, Saining Xie

    Abstract: We present Scalable Interpolant Transformers (SiT), a family of generative models built on the backbone of Diffusion Transformers (DiT). The interpolant framework, which allows for connecting two distributions in a more flexible way than standard diffusion models, makes possible a modular study of various design choices impacting generative models built on dynamical transport: using discrete vs. c… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Code available: https://github.com/willisma/SiT

  13. arXiv:2310.19817  [pdf, other

    eess.AS cs.SD

    Intelligibility prediction with a pretrained noise-robust automatic speech recognition model

    Authors: Zehai Tu, Ning Ma, Jon Barker

    Abstract: This paper describes two intelligibility prediction systems derived from a pretrained noise-robust automatic speech recognition (ASR) model for the second Clarity Prediction Challenge (CPC2). One system is intrusive and leverages the hidden representations of the ASR model. The other system is non-intrusive and makes predictions with derived ASR uncertainty. The ASR model is only pretrained with a… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  14. arXiv:2310.14184  [pdf, other

    cs.CV

    Partition Speeds Up Learning Implicit Neural Representations Based on Exponential-Increase Hypothesis

    Authors: Ke Liu, Feng Liu, Haishuai Wang, Ning Ma, Jiajun Bu, Bo Han

    Abstract: $\textit{Implicit neural representations}$ (INRs) aim to learn a $\textit{continuous function}$ (i.e., a neural network) to represent an image, where the input and output of the function are pixel coordinates and RGB/Gray values, respectively. However, images tend to consist of many objects whose colors are not perfectly consistent, resulting in the challenge that image is actually a $\textit{disc… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023

  15. arXiv:2309.08966  [pdf, other

    cs.CV

    FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization

    Authors: Nan Ma, Mohan Wang, Yiheng Han, Yong-** Liu

    Abstract: Cross-modality point cloud registration is confronted with significant challenges due to inherent differences in modalities between different sensors. We propose a cross-modality point cloud registration framework FF-LOGO: a cross-modality point cloud registration method with feature filtering and local-global optimization. The cross-modality feature correlation filtering module extracts geometric… ▽ More

    Submitted 12 April, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

    Comments: Accepted by 2024 IEEE International Conference on Robotics and Automation (ICRA),7 pages, 2 figures

  16. arXiv:2309.07084  [pdf, other

    cs.CV

    SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection

    Authors: Yiran Qin, Chaoqun Wang, Zijian Kang, Ningning Ma, Zhen Li, Ruimao Zhang

    Abstract: In this paper, we propose a novel training strategy called SupFusion, which provides an auxiliary feature level supervision for effective LiDAR-Camera fusion and significantly boosts detection performance. Our strategy involves a data enhancement method named Polar Sampling, which densifies sparse objects and trains an assistant model to generate high-quality features as the supervision. These fea… ▽ More

    Submitted 31 October, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV2023

  17. arXiv:2309.02171  [pdf, other

    cs.IT eess.SP

    A Wideband MIMO Channel Model for Aerial Intelligent Reflecting Surface-Assisted Wireless Communications

    Authors: Shaoyi Liu, Nan Ma, Yaning Chen, Ke Peng, Dongsheng Xue

    Abstract: Compared to traditional intelligent reflecting surfaces(IRS), aerial IRS (AIRS) has unique advantages, such as more flexible deployment and wider service coverage. However, modeling AIRS in the channel presents new challenges due to their mobility. In this paper, a three-dimensional (3D) wideband channel model for AIRS and IRS joint-assisted multiple-input multiple-output (MIMO) communication syst… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: 6 pages, 7 figures

  18. arXiv:2308.05309  [pdf, other

    cs.LG cs.AI cs.SI

    Homophily-enhanced Structure Learning for Graph Clustering

    Authors: Ming Gu, Gaoming Yang, Sheng Zhou, Ning Ma, Jiawei Chen, Qiaoyu Tan, Meihan Liu, Jiajun Bu

    Abstract: Graph clustering is a fundamental task in graph analysis, and recent advances in utilizing graph neural networks (GNNs) have shown impressive results. Despite the success of existing GNN-based graph clustering methods, they often overlook the quality of graph structure, which is inherent in real-world graphs due to their sparse and multifarious nature, leading to subpar performance. Graph structur… ▽ More

    Submitted 30 October, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: 11 pages with 7 figures. Accepted by CIKM'23

  19. arXiv:2305.19069  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Multi-source adversarial transfer learning for ultrasound image segmentation with limited similarity

    Authors: Yifu Zhang, Hongru Li, Tao Yang, Rui Tao, Zhengyuan Liu, Shimeng Shi, Jiansong Zhang, Ning Ma, Wu** Feng, Zhanhu Zhang, Xinyu Zhang

    Abstract: Lesion segmentation of ultrasound medical images based on deep learning techniques is a widely used method for diagnosing diseases. Although there is a large amount of ultrasound image data in medical centers and other places, labeled ultrasound datasets are a scarce resource, and it is likely that no datasets are available for new tissues/organs. Transfer learning provides the possibility to solv… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Submitted to Applied Soft Computing Journal

  20. arXiv:2304.03526  [pdf, other

    cs.CV

    Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field

    Authors: Leheng Li, Qing Lian, Luozhou Wang, Ningning Ma, Ying-Cong Chen

    Abstract: This work explores the use of 3D generative models to synthesize training data for 3D vision tasks. The key requirements of the generative models are that the generated data should be photorealistic to match the real-world scenarios, and the corresponding 3D attributes should be aligned with given sampling labels. However, we find that the recent NeRF-based 3D GANs hardly meet the above requiremen… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

  21. arXiv:2303.07399  [pdf, other

    cs.CV

    RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose

    Authors: Tao Jiang, Peng Lu, Li Zhang, Ningsheng Ma, Rui Han, Chengqi Lyu, Yining Li, Kai Chen

    Abstract: Recent studies on 2D pose estimation have achieved excellent performance on public benchmarks, yet its application in the industrial community still suffers from heavy model parameters and high latency. In order to bridge this gap, we empirically explore key factors in pose estimation including paradigm, model architecture, training strategy, and deployment, and present a high-performance real-tim… ▽ More

    Submitted 2 July, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

  22. arXiv:2301.06826  [pdf, other

    cs.RO

    Vis2Hap: Vision-based Haptic Rendering by Cross-modal Generation

    Authors: Guanqun Cao, Jiaqi Jiang, Ningtao Mao, Danushka Bollegala, Min Li, Shan Luo

    Abstract: To assist robots in teleoperation tasks, haptic rendering which allows human operators access a virtual touch feeling has been developed in recent years. Most previous haptic rendering methods strongly rely on data collected by tactile sensors. However, tactile data is not widely available for robots due to their limited reachable space and the restrictions of tactile sensors. To eliminate the nee… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Comments: This paper is accepted at ICRA 2023

  23. arXiv:2211.02701  [pdf, other

    cs.LG cs.AI cs.CV

    MONAI: An open-source framework for deep learning in healthcare

    Authors: M. Jorge Cardoso, Wenqi Li, Richard Brown, Nic Ma, Eric Kerfoot, Yiheng Wang, Benjamin Murrey, Andriy Myronenko, Can Zhao, Dong Yang, Vishwesh Nath, Yufan He, Ziyue Xu, Ali Hatamizadeh, Andriy Myronenko, Wentao Zhu, Yun Liu, Mingxin Zheng, Yucheng Tang, Isaac Yang, Michael Zephyr, Behrooz Hashemian, Sachidanand Alle, Mohammad Zalbagi Darestani, Charlie Budd , et al. (32 additional authors not shown)

    Abstract: Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geo… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: www.monai.io

  24. arXiv:2211.01153  [pdf, other

    cs.IR cs.DS cs.LG

    A Two Step Approach to Weighted Bipartite Link Recommendations

    Authors: Nathan Ma

    Abstract: Many real world person-person or person-product relationships can be modeled graphically. More specifically, bipartite graphs can be especially useful when modeling scenarios that involve two disjoint groups. As a result, many existing papers have utilized bipartite graphs for the classical link recommendation problem. In this paper, using the principle of bipartite graphs, we present another appr… ▽ More

    Submitted 29 October, 2022; originally announced November 2022.

  25. arXiv:2210.17440  [pdf, other

    cs.CL

    Semantic Novelty Detection and Characterization in Factual Text Involving Named Entities

    Authors: Nianzu Ma, Sahisnu Mazumder, Alexander Politowicz, Bing Liu, Eric Robertson, Scott Grigsby

    Abstract: Much of the existing work on text novelty detection has been studied at the topic level, i.e., identifying whether the topic of a document or a sentence is novel or not. Little work has been done at the fine-grained semantic level (or contextual level). For example, given that we know Elon Musk is the CEO of a technology company, the sentence "Elon Musk acted in the sitcom The Big Bang Theory" is… ▽ More

    Submitted 31 October, 2022; originally announced October 2022.

    Comments: 28 pages, 2 figures

    ACM Class: I.2.7

  26. arXiv:2210.13291  [pdf, other

    cs.LG cs.AI cs.CV cs.NI cs.SE

    NVIDIA FLARE: Federated Learning from Simulation to Real-World

    Authors: Holger R. Roth, Yan Cheng, Yuhong Wen, Isaac Yang, Ziyue Xu, Yuan-Ting Hsieh, Kristopher Kersten, Ahmed Harouni, Can Zhao, Kevin Lu, Zhihong Zhang, Wenqi Li, Andriy Myronenko, Dong Yang, Sean Yang, Nicola Rieke, Abood Quraini, Chester Chen, Daguang Xu, Nic Ma, Prerna Dogra, Mona Flores, Andrew Feng

    Abstract: Federated learning (FL) enables building robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without centralizing the data. We created NVIDIA FLARE as an open-source software development kit (SDK) to make it easier for data scientists to use FL in their research and real-world applications. The SDK includes solutions for state-of-the-art FL algorithms and… ▽ More

    Submitted 28 April, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted at the International Workshop on Federated Learning, NeurIPS 2022, New Orleans, USA (https://federated-learning.org/fl-neurips-2022); Revised version v2: added Key Components list, system metrics for homomorphic encryption experiment; Extended v3 for journal submission

    Journal ref: IEEE Data Eng. Bull., Vol. 46, No. 1, 2023

  27. arXiv:2209.13259  [pdf, other

    cs.DC

    Timeliness of Information for Computation-intensive Status Updates in Task-oriented Communications

    Authors: Xiaoqi Qin, Yanlin Li, Xianxin Song, Nan Ma, Chuan Huang, ** Zhang

    Abstract: Moving beyond just interconnected devices, the increasing interplay between communication and computation has fed the vision of real-time networked control systems. To obtain timely situational awareness, IoT devices continuously sample computation-intensive status updates, generate perception tasks and offload them to edge servers for processing. In this sense, the timeliness of information is co… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

  28. arXiv:2205.09377  [pdf, other

    cs.IT eess.SP

    Coexistence between Task- and Data-Oriented Communications: A Whittle's Index Guided Multi-Agent Reinforcement Learning Approach

    Authors: Ran Li, Chuan Huang, Xiaoqi Qin, Shengpei Jiang, Nan Ma, Shuguang Cui

    Abstract: We investigate the coexistence of task-oriented and data-oriented communications in a IoT system that shares a group of channels, and study the scheduling problem to jointly optimize the weighted age of incorrect information (AoII) and throughput, which are the performance metrics of the two types of communications, respectively. This problem is formulated as a Markov decision problem, which is di… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  29. arXiv:2205.00692  [pdf, other

    cs.NI

    Energy-efficient Caching and Task offloading for Timely Status Updates in UAV-assisted VANETs

    Authors: Nan Hu, Xiaoqi Qin, Nan Ma, Yiming Liu, Yuanyuan Yao, ** Zhang

    Abstract: Intelligent edge network is maturing to enable smart and efficient transportation systems. In this letter, we consider unmanned aerial vehicle (UAV)-assisted vehicular networks where UAVs provide caching and computing services in complement with base station (BS). One major challenge is that vehicles need to obtain timely situational awareness via orchestration of ubiquitous caching and computing… ▽ More

    Submitted 4 May, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

  30. arXiv:2204.13590  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Computer Vision for Road Imaging and Pothole Detection: A State-of-the-Art Review of Systems and Algorithms

    Authors: Nachuan Ma, Jiahe Fan, Wenshuo Wang, ** Wu, Yu Jiang, Lihua Xie, Rui Fan

    Abstract: Computer vision algorithms have been prevalently utilized for 3-D road imaging and pothole detection for over two decades. Nonetheless, there is a lack of systematic survey articles on state-of-the-art (SoTA) computer vision techniques, especially deep learning models, developed to tackle these problems. This article first introduces the sensing systems employed for 2-D and 3-D road data acquisiti… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

    Comments: accepted to Transportation Safety and Environment

  31. arXiv:2204.04288  [pdf, other

    eess.AS cs.SD

    Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction

    Authors: Zehai Tu, Ning Ma, Jon Barker

    Abstract: Non-intrusive intelligibility prediction is important for its application in realistic scenarios, where a clean reference signal is difficult to access. The construction of many non-intrusive predictors require either ground truth intelligibility labels or clean reference signals for supervised learning. In this work, we leverage an unsupervised uncertainty estimation method for predicting speech… ▽ More

    Submitted 6 July, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: Accepted to INTERSPEECH2022

  32. arXiv:2204.04287  [pdf, other

    eess.AS cs.SD q-bio.QM

    Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners

    Authors: Zehai Tu, Ning Ma, Jon Barker

    Abstract: An accurate objective speech intelligibility prediction algorithms is of great interest for many applications such as speech enhancement for hearing aids. Most algorithms measures the signal-to-noise ratios or correlations between the acoustic features of clean reference signals and degraded signals. However, these hand-picked acoustic features are usually not explicitly correlated with recognitio… ▽ More

    Submitted 6 July, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: Accepted to INTERSPEECH2022

  33. arXiv:2204.04284  [pdf, other

    eess.AS cs.SD

    Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition

    Authors: Zehai Tu, Jack Deadman, Ning Ma, Jon Barker

    Abstract: End-to-end models have achieved significant improvement on automatic speech recognition. One common method to improve performance of these models is expanding the data-space through data augmentation. Meanwhile, human auditory inspired front-ends have also demonstrated improvement for automatic speech recognisers. In this work, a well-verified auditory-based model, which can simulate various heari… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

  34. arXiv:2112.02706  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

    Authors: Zixuan Ke, Bing Liu, Nianzu Ma, Hu Xu, Lei Shu

    Abstract: Continual learning (CL) learns a sequence of tasks incrementally with the goal of achieving two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge transfer (KT) across tasks. However, most existing techniques focus only on overcoming CF and have no mechanism to encourage KT, and thus do not do well in KT. Although several papers have tried to deal with both CF and K… ▽ More

    Submitted 5 December, 2021; originally announced December 2021.

    Journal ref: NeurIPS 2021

  35. arXiv:2108.02948  [pdf, other

    cs.CV

    Deep Learning-based Biological Anatomical Landmark Detection in Colonoscopy Videos

    Authors: Kaiwei Che, Chengwei Ye, Yibing Yao, Nachuan Ma, Ruo Zhang, Jiankun Wang, Max Q. -H. Meng

    Abstract: Colonoscopy is a standard imaging tool for visualizing the entire gastrointestinal (GI) tract of patients to capture lesion areas. However, it takes the clinicians excessive time to review a large number of images extracted from colonoscopy videos. Thus, automatic detection of biological anatomical landmarks within the colon is highly demanded, which can help reduce the burden of clinicians by pro… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: 9 pages, 7 figures

  36. arXiv:2107.06782  [pdf

    q-fin.ST cs.LG

    Clustering and attention model based for intelligent trading

    Authors: Mimansa Rana, Nanxiang Mao, Ming Ao, Xiaohui Wu, Poning Liang, Matloob Khushi

    Abstract: The foreign exchange market has taken an important role in the global financial market. While foreign exchange trading brings high-yield opportunities to investors, it also brings certain risks. Since the establishment of the foreign exchange market in the 20th century, foreign exchange rate forecasting has become a hot issue studied by scholars from all over the world. Due to the complexity and n… ▽ More

    Submitted 6 August, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

  37. arXiv:2107.06735  [pdf, other

    cs.CV

    Semi-Supervised Hypothesis Transfer for Source-Free Domain Adaptation

    Authors: Ning Ma, Jiajun Bu, Lixian Lu, Jun Wen, Zhen Zhang, Sheng Zhou, Xifeng Yan

    Abstract: Domain Adaptation has been widely used to deal with the distribution shift in vision, language, multimedia etc. Most domain adaptation methods learn domain-invariant features with data from both domains available. However, such a strategy might be infeasible in practice when source data are unavailable due to data-privacy concerns. To address this issue, we propose a novel adaptation method via hy… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

  38. Uncertainty-Guided Mixup for Semi-Supervised Domain Adaptation without Source Data

    Authors: Ning Ma, Jiajun Bu, Zhen Zhang, Sheng Zhou

    Abstract: Present domain adaptation methods usually perform explicit representation alignment by simultaneously accessing the source data and target data. However, the source data are not always available due to the privacy preserving consideration or bandwidth limitation. Source-free domain adaptation aims to solve the above problem by performing domain adaptation without accessing the source data. The ada… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

    Report number: 11

    Journal ref: Volume 262, 28 February 2023, 110208

  39. arXiv:2106.04639  [pdf, other

    cs.SD eess.AS

    Optimising Hearing Aid Fittings for Speech in Noise with a Differentiable Hearing Loss Model

    Authors: Zehai Tu, Ning Ma, Jon Barker

    Abstract: Current hearing aids normally provide amplification based on a general prescriptive fitting, and the benefits provided by the hearing aids vary among different listening environments despite the inclusion of noise suppression feature. Motivated by this fact, this paper proposes a data-driven machine learning technique to develop hearing aid fittings that are customised to speech in different noisy… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021

  40. arXiv:2103.09030  [pdf, other

    cs.CV eess.IV

    A Large-Scale Dataset for Benchmarking Elevator Button Segmentation and Character Recognition

    Authors: Jianbang Liu, Yuqi Fang, Delong Zhu, Nachuan Ma, ** Pan, Max Q. -H. Meng

    Abstract: Human activities are hugely restricted by COVID-19, recently. Robots that can conduct inter-floor navigation attract much public attention, since they can substitute human workers to conduct the service work. However, current robots either depend on human assistance or elevator retrofitting, and fully autonomous inter-floor navigation is still not available. As the very first step of inter-floor n… ▽ More

    Submitted 22 March, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

  41. arXiv:2103.08569  [pdf, other

    cs.SD cs.LG

    DHASP: Differentiable Hearing Aid Speech Processing

    Authors: Zehai Tu, Ning Ma, Jon Barker

    Abstract: Hearing aids are expected to improve speech intelligibility for listeners with hearing impairment. An appropriate amplification fitting tuned for the listener's hearing disability is critical for good performance. The developments of most prescriptive fittings are based on data collected in subjective listening experiments, which are usually expensive and time-consuming. In this paper, we explore… ▽ More

    Submitted 15 March, 2021; originally announced March 2021.

    Comments: To appear at ICASSP 2021

  42. arXiv:2101.03697  [pdf, other

    cs.CV cs.AI cs.LG

    RepVGG: Making VGG-style ConvNets Great Again

    Authors: Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, Jian Sun

    Abstract: We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique so that the model is named RepVGG… ▽ More

    Submitted 29 March, 2021; v1 submitted 10 January, 2021; originally announced January 2021.

    Comments: CVPR 2021

  43. arXiv:2012.03166  [pdf, other

    cs.RO cs.AI eess.IV

    Conditional Generative Adversarial Networks for Optimal Path Planning

    Authors: Nachuan Ma, Jiankun Wang, Max Q. -H. Meng

    Abstract: Path planning plays an important role in autonomous robot systems. Effective understanding of the surrounding environment and efficient generation of optimal collision-free path are both critical parts for solving path planning problem. Although conventional sampling-based algorithms, such as the rapidly-exploring random tree (RRT) and its improved optimal version (RRT*), have been widely used in… ▽ More

    Submitted 5 December, 2020; originally announced December 2020.

  44. arXiv:2012.00105  [pdf

    cs.CY

    Using Data Analytics to predict students score

    Authors: Nang Laik Ma, Gim Hong Chua

    Abstract: Education is very important to Singapore, and the government has continued to invest heavily in our education system to become one of the world-class systems today. A strong foundation of Science, Technology, Engineering, and Mathematics (STEM) was what underpinned Singapore's development over the past 50 years. PISA is a triennial international survey that evaluates education systems worldwide by… ▽ More

    Submitted 19 November, 2020; originally announced December 2020.

  45. arXiv:2009.04759  [pdf, other

    cs.CV

    Activate or Not: Learning Customized Activation

    Authors: Ningning Ma, Xiangyu Zhang, Ming Liu, Jian Sun

    Abstract: We present a simple, effective, and general activation function we term ACON which learns to activate the neurons or not. Interestingly, we find Swish, the recent popular NAS-searched activation, can be interpreted as a smooth approximation to ReLU. Intuitively, in the same way, we approximate the more general Maxout family to our novel ACON family, which remarkably improves the performance and ma… ▽ More

    Submitted 16 April, 2021; v1 submitted 10 September, 2020; originally announced September 2020.

    Comments: CVPR 2021

  46. arXiv:2007.11824  [pdf, other

    cs.CV

    Funnel Activation for Visual Recognition

    Authors: Ningning Ma, Xiangyu Zhang, Jian Sun

    Abstract: We present a conceptually simple but effective funnel activation for image recognition tasks, called Funnel activation (FReLU), that extends ReLU and PReLU to a 2D activation by adding a negligible overhead of spatial condition. The forms of ReLU and PReLU are y = max(x, 0) and y = max(x, px), respectively, while FReLU is in the form of y = max(x,T(x)), where T(x) is the 2D spatial condition. More… ▽ More

    Submitted 24 July, 2020; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: ECCV 2020

  47. arXiv:2007.11823  [pdf, other

    cs.CV

    WeightNet: Revisiting the Design Space of Weight Networks

    Authors: Ningning Ma, Xiangyu Zhang, Jiawei Huang, Jian Sun

    Abstract: We present a conceptually simple, flexible and effective framework for weight generating networks. Our approach is general that unifies two current distinct and extremely effective SENet and CondConv into the same framework on weight space. The method, called WeightNet, generalizes the two methods by simply adding one more grouped fully-connected layer to the attention activation layer. We use the… ▽ More

    Submitted 24 July, 2020; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: ECCV 2020

  48. arXiv:2007.11806  [pdf, other

    cs.CV cs.RO

    Autonomous Removal of Perspective Distortion of Elevator Button Images based on Corner Detection

    Authors: Nachuan Ma, Jianbang Liu, Delong Zhu

    Abstract: Elevator button recognition is a critical function to realize the autonomous operation of elevators. However, challenging image conditions and various image distortions make it difficult to recognize buttons accurately. To fill this gap, we propose a novel deep learning-based approach, which aims to autonomously correct perspective distortions of elevator button images based on button corner detec… ▽ More

    Submitted 1 September, 2021; v1 submitted 23 July, 2020; originally announced July 2020.

  49. arXiv:2003.08246  [pdf, other

    cs.LG stat.ML

    Adaptive-Step Graph Meta-Learner for Few-Shot Graph Classification

    Authors: Ning Ma, Jiajun Bu, Jieyu Yang, Zhen Zhang, Chengwei Yao, Zhi Yu, Sheng Zhou, Xifeng Yan

    Abstract: Graph classification aims to extract accurate information from graph-structured data for classification and is becoming more and more important in graph learning community. Although Graph Neural Networks (GNNs) have been successfully applied to graph classification tasks, most of them overlook the scarcity of labeled graph data in many applications. For example, in bioinformatics, obtaining protei… ▽ More

    Submitted 23 June, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

  50. arXiv:2003.01294  [pdf, ps, other

    cs.IT cs.LG

    Accelerating Generalized Benders Decomposition for Wireless Resource Allocation

    Authors: Mengyuan Lee, Ning Ma, Guanding Yu, Huaiyu Dai

    Abstract: Generalized Benders decomposition (GBD) is a globally optimal algorithm for mixed integer nonlinear programming (MINLP) problems, which are NP-hard and can be widely found in the area of wireless resource allocation. The main idea of GBD is decomposing an MINLP problem into a primal problem and a master problem, which are iteratively solved until their solutions converge. However, a direct impleme… ▽ More

    Submitted 14 October, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: This paper was accpeted by the IEEE Transactions on Wireless Communications