Search | arXiv e-print repository

arXiv:2406.19666 [pdf, other]

CSAKD: Knowledge Distillation with Cross Self-Attention for Hyperspectral and Multispectral Image Fusion

Authors: Chih-Chung Hsu, Chih-Chien Ni, Chia-Ming Lee, Li-Wei Kang

Abstract: Hyperspectral imaging, capturing detailed spectral information for each pixel, is pivotal in diverse scientific and industrial applications. Yet, the acquisition of high-resolution (HR) hyperspectral images (HSIs) often needs to be addressed due to the hardware limitations of existing imaging systems. A prevalent workaround involves capturing both a high-resolution multispectral image (HR-MSI) and… ▽ More Hyperspectral imaging, capturing detailed spectral information for each pixel, is pivotal in diverse scientific and industrial applications. Yet, the acquisition of high-resolution (HR) hyperspectral images (HSIs) often needs to be addressed due to the hardware limitations of existing imaging systems. A prevalent workaround involves capturing both a high-resolution multispectral image (HR-MSI) and a low-resolution (LR) HSI, subsequently fusing them to yield the desired HR-HSI. Although deep learning-based methods have shown promising in HR-MSI/LR-HSI fusion and LR-HSI super-resolution (SR), their substantial model complexities hinder deployment on resource-constrained imaging devices. This paper introduces a novel knowledge distillation (KD) framework for HR-MSI/LR-HSI fusion to achieve SR of LR-HSI. Our KD framework integrates the proposed Cross-Layer Residual Aggregation (CLRA) block to enhance efficiency for constructing Dual Two-Streamed (DTS) network structure, designed to extract joint and distinct features from LR-HSI and HR-MSI simultaneously. To fully exploit the spatial and spectral feature representations of LR-HSI and HR-MSI, we propose a novel Cross Self-Attention (CSA) fusion module to adaptively fuse those features to improve the spatial and spectral quality of the reconstructed HR-HSI. Finally, the proposed KD-based joint loss function is employed to co-train the teacher and student networks. Our experimental results demonstrate that the student model not only achieves comparable or superior LR-HSI SR performance but also significantly reduces the model-size and computational requirements. This marks a substantial advancement over existing state-of-the-art methods. The source code is available at https://github.com/ming053l/CSAKD. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: Submitted to TIP 2024

arXiv:2406.15743 [pdf, other]

CasModaTest: A Cascaded and Model-agnostic Self-directed Framework for Unit Test Generation

Authors: Chao Ni, Xiaoya Wang, Liushan Chen, Dehai Zhao, Zhengong Cai, Shaohua Wang, Xiaohu Yang

Abstract: Though many machine learning (ML)-based unit testing generation approaches have been proposed and indeed achieved remarkable performance, they still have several limitations in effectiveness and practical usage. More precisely, existing ML-based approaches (1) generate partial content of a unit test, mainly focusing on test oracle generation; (2) mismatch the test prefix with the test oracle seman… ▽ More Though many machine learning (ML)-based unit testing generation approaches have been proposed and indeed achieved remarkable performance, they still have several limitations in effectiveness and practical usage. More precisely, existing ML-based approaches (1) generate partial content of a unit test, mainly focusing on test oracle generation; (2) mismatch the test prefix with the test oracle semantically; and (3) are highly bound with the close-sourced model, eventually damaging data security. We propose CasModaTest, a cascaded, model-agnostic, and end-to-end unit test generation framework, to alleviate the above limitations with two cascaded stages: test prefix generation and test oracle generation. Then, we manually build large-scale demo pools to provide CasModaTest with high-quality test prefixes and test oracles examples. Finally, CasModaTest automatically assembles the generated test prefixes and test oracles and compiles or executes them to check their effectiveness, optionally appending with several attempts to fix the errors occurring in compiling and executing phases. To evaluate the effectiveness of CasModaTest, we conduct large-scale experiments on a widely used dataset (Defects4J) and compare it with four state-of-the-art (SOTA) approaches by considering two performance measures. The experimental results indicate that CasModaTest outperforms all SOTAs with a substantial improvement (i.e., 60.62%-352.55% in terms of accuracy, 2.83%-87.27% in terms of focal method coverage). Besides, we also conduct experiments of CasModaTest on different open-source LLMs and find that CasModaTest can also achieve significant improvements over SOTAs (39.82%-293.96% and 9.25%-98.95% in terms of accuracy and focal method coverage, respectively) in end-to-end unit test generation △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 14 pages, 7 figures

arXiv:2406.12415 [pdf, other]

doi 10.1145/3643991.3644886

MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representation

Authors: Chao Ni, Liyu Shen, Xiaohu Yang, Yan Zhu, Shaohua Wang

Abstract: We constructed a newly large-scale and comprehensive C/C++ vulnerability dataset named MegaVul by crawling the Common Vulnerabilities and Exposures (CVE) database and CVE-related open-source projects. Specifically, we collected all crawlable descriptive information of the vulnerabilities from the CVE database and extracted all vulnerability-related code changes from 28 Git-based websites. We adopt… ▽ More We constructed a newly large-scale and comprehensive C/C++ vulnerability dataset named MegaVul by crawling the Common Vulnerabilities and Exposures (CVE) database and CVE-related open-source projects. Specifically, we collected all crawlable descriptive information of the vulnerabilities from the CVE database and extracted all vulnerability-related code changes from 28 Git-based websites. We adopt advanced tools to ensure the extracted code integrality and enrich the code with four different transformed representations. In total, MegaVul contains 17,380 vulnerabilities collected from 992 open-source repositories spanning 169 different vulnerability types disclosed from January 2006 to October 2023. Thus, MegaVul can be used for a variety of software security-related tasks including detecting vulnerabilities and assessing vulnerability severity. All information is stored in the JSON format for easy usage. MegaVul is publicly available on GitHub and will be continuously updated. It can be easily extended to other programming languages. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 5 pages, 4figures

arXiv:2406.02609 [pdf, other]

Less is More: Pseudo-Label Filtering for Continual Test-Time Adaptation

Authors: Jiayao Tan, Fan Lyu, Chenggong Ni, Tingliang Feng, Fuyuan Hu, Zhang Zhang, Shaochuang Zhao, Liang Wang

Abstract: Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. To adapt to unlabeled data from unknown domains, existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. However, these pseudo-labels often involve noise, leading to insufficient ad… ▽ More Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. To adapt to unlabeled data from unknown domains, existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. However, these pseudo-labels often involve noise, leading to insufficient adaptation. To improve the quality of pseudo-labels, we propose a pseudo-label selection method for CTTA, called Pseudo Labeling Filter (PLF). The key idea of PLF is to keep selecting appropriate thresholds for pseudo-labels and identify reliable ones for self-training. Specifically, we present three principles for setting thresholds during continuous domain learning, including initialization, growth and diversity. Based on these principles, we design Self-Adaptive Thresholding to filter pseudo-labels. Additionally, we introduce a Class Prior Alignment (CPA) method to encourage the model to make diverse predictions for unknown domain samples. Through extensive experiments, PLF outperforms current state-of-the-art methods, proving its effectiveness in CTTA. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.03335 by other authors

arXiv:2406.02009 [pdf, other]

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

Authors: Kun Zhou, Shengkui Zhao, Yukun Ma, Chong Zhang, Hao Wang, Dianwen Ng, Chongjia Ni, Nguyen Trung Hieu, Jia Qi Yip, Bin Ma

Abstract: Recent language model-based text-to-speech (TTS) frameworks demonstrate scalability and in-context learning capabilities. However, they suffer from robustness issues due to the accumulation of errors in speech unit predictions during autoregressive language modeling. In this paper, we propose a phonetic enhanced language modeling method to improve the performance of TTS models. We leverage self-su… ▽ More Recent language model-based text-to-speech (TTS) frameworks demonstrate scalability and in-context learning capabilities. However, they suffer from robustness issues due to the accumulation of errors in speech unit predictions during autoregressive language modeling. In this paper, we propose a phonetic enhanced language modeling method to improve the performance of TTS models. We leverage self-supervised representations that are phonetically rich as the training target for the autoregressive language model. Subsequently, a non-autoregressive model is employed to predict discrete acoustic codecs that contain fine-grained acoustic details. The TTS model focuses solely on linguistic modeling during autoregressive training, thereby reducing the error propagation that occurs in non-autoregressive training. Both objective and subjective evaluations validate the effectiveness of our proposed method. △ Less

Submitted 11 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2405.11196 [pdf, other]

Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models

Authors: Yan Wang, Xiaoning Li, Tien Nguyen, Shaohua Wang, Chao Ni, Ling Ding

Abstract: Pre-trained Large Language Models (LLM) have achieved remarkable successes in several domains. However, code-oriented LLMs are heavy in computational complexity, and quadratically with the length of the input. Toward simplifying the input program of an LLM, the state-of-the-art approach has the strategies to filter the input code tokens based on the attention scores given by the LLM. The decision… ▽ More Pre-trained Large Language Models (LLM) have achieved remarkable successes in several domains. However, code-oriented LLMs are heavy in computational complexity, and quadratically with the length of the input. Toward simplifying the input program of an LLM, the state-of-the-art approach has the strategies to filter the input code tokens based on the attention scores given by the LLM. The decision to simplify the input should not rely on the attention patterns of an LLM, as these patterns are influenced by both the model architecture and the pre-training dataset. Since the model and dataset are part of the solution domain, not the problem domain where the input belongs, the outcome may differ when the model is pre-trained on a different dataset. We propose SlimCode, a model-agnostic code simplification solution for LLMs that depends on the nature of input code tokens. As an empirical study on the LLMs including CodeBERT, CodeT5, and GPT-4 for two main tasks: code search and summarization, we reported that 1) the removal ratio of code has a linear-like relation with the saving ratio on training time, 2) the impact of categorized tokens on code simplification can vary significantly, 3) the impact of categorized tokens on code simplification is task-specific but model-agnostic, and 4) the above findings hold for the paradigm-prompt engineering and interactive in-context learning. The empirical results showed that SlimCode can improve the state-of-the-art technique by 9.46% and 5.15% in terms of MRR and BLEU score on code search and summarization. Moreover, SlimCode is 133 times faster than the state-of-the-art approach. Additionally, SlimCode can reduce the cost of invoking GPT-4 by up to 24% per API query, while still producing comparable results to those with the original code. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2404.02056 [pdf, other]

Multitask-based Evaluation of Open-Source LLM on Software Vulnerability

Authors: Xin Yin, Chao Ni, Shaohua Wang

Abstract: This paper proposes a pipeline for quantitatively evaluating interactive LLMs using publicly available datasets. We carry out an extensive technical evaluation of LLMs using Big-Vul covering four different common software vulnerability tasks. We evaluate the multitask and multilingual aspects of LLMs based on this dataset. We find that the existing state-of-the-art methods are generally superior t… ▽ More This paper proposes a pipeline for quantitatively evaluating interactive LLMs using publicly available datasets. We carry out an extensive technical evaluation of LLMs using Big-Vul covering four different common software vulnerability tasks. We evaluate the multitask and multilingual aspects of LLMs based on this dataset. We find that the existing state-of-the-art methods are generally superior to LLMs in software vulnerability detection. Although LLMs improve accuracy when providing context information, they still have limitations in accurately predicting severity ratings for certain CWE types. In addition, LLMs demonstrate some ability to locate vulnerabilities for certain CWE types, but their performance varies among different CWE types. Finally, LLMs show uneven performance in generating CVE descriptions for various CWE types, with limited accuracy in a few-shot setting. Overall, though LLMs perform well in some aspects, they still need improvement in understanding the subtle differences in code vulnerabilities and the ability to describe vulnerabilities to fully realize their potential. Our evaluation pipeline provides valuable insights for further enhancing LLMs' software vulnerability handling capabilities. △ Less

Submitted 25 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.13219 [pdf, other]

Diffusion Model for Data-Driven Black-Box Optimization

Authors: Zihao Li, Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Yinyu Ye, Minshuo Chen, Mengdi Wang

Abstract: Generative AI has redefined artificial intelligence, enabling the creation of innovative content and customized solutions that drive business practices into a new era of efficiency and creativity. In this paper, we focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization over complex structured variables. Consider the practical scen… ▽ More Generative AI has redefined artificial intelligence, enabling the creation of innovative content and customized solutions that drive business practices into a new era of efficiency and creativity. In this paper, we focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization over complex structured variables. Consider the practical scenario where one wants to optimize some structured design in a high-dimensional space, based on massive unlabeled data (representing design variables) and a small labeled dataset. We study two practical types of labels: 1) noisy measurements of a real-valued reward function and 2) human preference based on pairwise comparisons. The goal is to generate new designs that are near-optimal and preserve the designed latent structures. Our proposed method reformulates the design optimization problem into a conditional sampling problem, which allows us to leverage the power of diffusion models for modeling complex distributions. In particular, we propose a reward-directed conditional diffusion model, to be trained on the mixed data, for sampling a near-optimal solution conditioned on high predicted rewards. Theoretically, we establish sub-optimality error bounds for the generated designs. The sub-optimality gap nearly matches the optimal guarantee in off-policy bandits, demonstrating the efficiency of reward-directed diffusion models for black-box optimization. Moreover, when the data admits a low-dimensional latent subspace structure, our model efficiently generates high-fidelity designs that closely respect the latent structure. We provide empirical experiments validating our model in decision-making and content-creation tasks. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2307.07055

arXiv:2403.00807 [pdf]

Enhancing Cloud-Based Large Language Model Processing with Elasticsearch and Transformer Models

Authors: Chunhe Ni, Jiang Wu, Hongbo Wang, Wenran Lu, Chenwei Zhang

Abstract: Large Language Models (LLMs) are a class of generative AI models built using the Transformer network, capable of leveraging vast datasets to identify, summarize, translate, predict, and generate language. LLMs promise to revolutionize society, yet training these foundational models poses immense challenges. Semantic vector search within large language models is a potent technique that can signific… ▽ More Large Language Models (LLMs) are a class of generative AI models built using the Transformer network, capable of leveraging vast datasets to identify, summarize, translate, predict, and generate language. LLMs promise to revolutionize society, yet training these foundational models poses immense challenges. Semantic vector search within large language models is a potent technique that can significantly enhance search result accuracy and relevance. Unlike traditional keyword-based search methods, semantic search utilizes the meaning and context of words to grasp the intent behind queries and deliver more precise outcomes. Elasticsearch emerges as one of the most popular tools for implementing semantic search an exceptionally scalable and robust search engine designed for indexing and searching extensive datasets. In this article, we delve into the fundamentals of semantic search and explore how to harness Elasticsearch and Transformer models to bolster large language model processing paradigms. We gain a comprehensive understanding of semantic search principles and acquire practical skills for implementing semantic search in real-world model application scenarios. △ Less

Submitted 24 February, 2024; originally announced March 2024.

arXiv:2403.00806 [pdf]

Enhanced User Interaction in Operating Systems through Machine Learning Language Models

Authors: Chenwei Zhang, Wenran Lu, Chunhe Ni, Hongbo Wang, Jiang Wu

Abstract: With the large language model showing human-like logical reasoning and understanding ability, whether agents based on the large language model can simulate the interaction behavior of real users, so as to build a reliable virtual recommendation A/B test scene to help the application of recommendation research is an urgent, important and economic value problem. The combination of interaction design… ▽ More With the large language model showing human-like logical reasoning and understanding ability, whether agents based on the large language model can simulate the interaction behavior of real users, so as to build a reliable virtual recommendation A/B test scene to help the application of recommendation research is an urgent, important and economic value problem. The combination of interaction design and machine learning can provide a more efficient and personalized user experience for products and services. This personalized service can meet the specific needs of users and improve user satisfaction and loyalty. Second, the interactive system can understand the user's views and needs for the product by providing a good user interface and interactive experience, and then use machine learning algorithms to improve and optimize the product. This iterative optimization process can continuously improve the quality and performance of the product to meet the changing needs of users. At the same time, designers need to consider how these algorithms and tools can be combined with interactive systems to provide a good user experience. This paper explores the potential applications of large language models, machine learning and interaction design for user interaction in recommendation systems and operating systems. By integrating these technologies, more intelligent and personalized services can be provided to meet user needs and promote continuous improvement and optimization of products. This is of great value for both recommendation research and user experience applications. △ Less

Submitted 24 February, 2024; originally announced March 2024.

arXiv:2402.12916 [pdf]

Data Pipeline Training: Integrating AutoML to Optimize the Data Flow of Machine Learning Models

Authors: Jiang Wu, Hongbo Wang, Chunhe Ni, Chenwei Zhang, Wenran Lu

Abstract: Data Pipeline plays an indispensable role in tasks such as modeling machine learning and develo** data products. With the increasing diversification and complexity of Data sources, as well as the rapid growth of data volumes, building an efficient Data Pipeline has become crucial for improving work efficiency and solving complex problems. This paper focuses on exploring how to optimize data flow… ▽ More Data Pipeline plays an indispensable role in tasks such as modeling machine learning and develo** data products. With the increasing diversification and complexity of Data sources, as well as the rapid growth of data volumes, building an efficient Data Pipeline has become crucial for improving work efficiency and solving complex problems. This paper focuses on exploring how to optimize data flow through automated machine learning methods by integrating AutoML with Data Pipeline. We will discuss how to leverage AutoML technology to enhance the intelligence of Data Pipeline, thereby achieving better results in machine learning tasks. By delving into the automation and optimization of Data flows, we uncover key strategies for constructing efficient data pipelines that can adapt to the ever-changing data landscape. This not only accelerates the modeling process but also provides innovative solutions to complex problems, enabling more significant outcomes in increasingly intricate data domains. Keywords- Data Pipeline Training;AutoML; Data environment; Machine learning △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2312.11825 [pdf, other]

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation

Authors: Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Jiaqi Yip, Dianwen Ng, Bin Ma

Abstract: Our previously proposed MossFormer has achieved promising performance in monaural speech separation. However, it predominantly adopts a self-attention-based MossFormer module, which tends to emphasize longer-range, coarser-scale dependencies, with a deficiency in effectively modelling finer-scale recurrent patterns. In this paper, we introduce a novel hybrid model that provides the capabilities to… ▽ More Our previously proposed MossFormer has achieved promising performance in monaural speech separation. However, it predominantly adopts a self-attention-based MossFormer module, which tends to emphasize longer-range, coarser-scale dependencies, with a deficiency in effectively modelling finer-scale recurrent patterns. In this paper, we introduce a novel hybrid model that provides the capabilities to model both long-range, coarse-scale dependencies and fine-scale recurrent patterns by integrating a recurrent module into the MossFormer framework. Instead of applying the recurrent neural networks (RNNs) that use traditional recurrent connections, we present a recurrent module based on a feedforward sequential memory network (FSMN), which is considered "RNN-free" recurrent network due to the ability to capture recurrent patterns without using recurrent connections. Our recurrent module mainly comprises an enhanced dilated FSMN block by using gated convolutional units (GCU) and dense connections. In addition, a bottleneck layer and an output layer are also added for controlling information flow. The recurrent module relies on linear projections and convolutions for seamless, parallel processing of the entire sequence. The integrated MossFormer2 hybrid model demonstrates remarkable enhancements over MossFormer and surpasses other state-of-the-art methods in WSJ0-2/3mix, Libri2Mix, and WHAM!/WHAMR! benchmarks. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: 5 pages, 3 figures, accepted by ICASSP 2024

arXiv:2311.10261 [pdf, other]

Vision meets mmWave Radar: 3D Object Perception Benchmark for Autonomous Driving

Authors: Yizhou Wang, Jen-Hao Cheng, Jui-Te Huang, Sheng-Yao Kuan, Qiqian Fu, Chiming Ni, Shengyu Hao, Gaoang Wang, Guanbin Xing, Hui Liu, Jenq-Neng Hwang

Abstract: Sensor fusion is crucial for an accurate and robust perception system on autonomous vehicles. Most existing datasets and perception solutions focus on fusing cameras and LiDAR. However, the collaboration between camera and radar is significantly under-exploited. The incorporation of rich semantic information from the camera, and reliable 3D information from the radar can potentially achieve an eff… ▽ More Sensor fusion is crucial for an accurate and robust perception system on autonomous vehicles. Most existing datasets and perception solutions focus on fusing cameras and LiDAR. However, the collaboration between camera and radar is significantly under-exploited. The incorporation of rich semantic information from the camera, and reliable 3D information from the radar can potentially achieve an efficient, cheap, and portable solution for 3D object perception tasks. It can also be robust to different lighting or all-weather driving scenarios due to the capability of mmWave radars. In this paper, we introduce the CRUW3D dataset, including 66K synchronized and well-calibrated camera, radar, and LiDAR frames in various driving scenarios. Unlike other large-scale autonomous driving datasets, our radar data is in the format of radio frequency (RF) tensors that contain not only 3D location information but also spatio-temporal semantic information. This kind of radar format can enable machine learning models to generate more reliable object perception results after interacting and fusing the information or features between the camera and radar. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2309.12608 [pdf, other]

SPGM: Prioritizing Local Features for enhanced speech separation performance

Authors: Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Dianwen Ng, Eng Siong Chng, Bin Ma

Abstract: Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlap** chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we pro… ▽ More Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlap** chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we propose the Single-Path Global Modulation (SPGM) block to replace inter-blocks. SPGM is named after its structure consisting of a parameter-free global pooling module followed by a modulation module comprising only 2% of the model's total parameters. The SPGM block allows all transformer layers in the model to be dedicated to local feature modelling, making the overall model single-path. SPGM achieves 22.1 dB SI-SDRi on WSJ0-2Mix and 20.4 dB SI-SDRi on Libri2Mix, exceeding the performance of Sepformer by 0.5 dB and 0.3 dB respectively and matches the performance of recent SOTA models with up to 8 times fewer parameters. Model and weights are available at huggingface.co/yipjiaqi/spgm △ Less

Submitted 10 March, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: This paper was accepted by ICASSP 2024

arXiv:2309.09413 [pdf, other]

Are Soft Prompts Good Zero-shot Learners for Speech Recognition?

Authors: Dianwen Ng, Chong Zhang, Ruixi Zhang, Yukun Ma, Fabian Ritter-Gutierrez, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma

Abstract: Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understa… ▽ More Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understanding of this emerging method by investigating the role of soft prompts in automatic speech recognition (ASR). Our findings highlight their role as zero-shot learners in improving ASR performance but also make them vulnerable to malicious modifications. Soft prompts aid generalization but are not obligatory for inference. We also identify two primary roles of soft prompts: content refinement and noise information enhancement, which enhances robustness against background noise. Additionally, we propose an effective modification on noise prompts to show that they are capable of zero-shot learning on adapting to out-of-distribution noise environments. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2309.00154 [pdf, other]

Learning From Peers: A Survey of Perception and Utilization of Online Peer Support Among Informal Dementia Caregivers

Authors: Zhijun Yin, Lauren Stratton, Qingyuan Song, Congning Ni, Lijun Song, Patricia A. Commiskey, Qingxia Chen, Monica Moreno, Sam Fazio, Bradley A. Malin

Abstract: Informal dementia caregivers are those who care for a person living with dementia (PLWD) without receiving payment (e.g., family members, friends, or other unpaid caregivers). These informal caregivers are subject to substantial mental, physical, and financial burdens. Online communities enable these caregivers to exchange caregiving strategies and communicate experiences with other caregivers who… ▽ More Informal dementia caregivers are those who care for a person living with dementia (PLWD) without receiving payment (e.g., family members, friends, or other unpaid caregivers). These informal caregivers are subject to substantial mental, physical, and financial burdens. Online communities enable these caregivers to exchange caregiving strategies and communicate experiences with other caregivers whom they generally do not know in real life. Research has demonstrated the benefits of peer support in online communities, but they are limited in focusing merely on caregivers who are already online users. In this paper, we designed and administered a survey to investigate the perception and utilization of online peer support from 140 informal dementia caregivers (with 100 online-community caregivers). Our findings show that the behavior to access any online community is only significantly associated with their belief in the value of online peer support (p = 0.006). Moreover, 33 (83%) of the 40 non-online-community caregivers had a belief score above 24, a score assigned when a neutral option is selected for each belief question. The reasons most articulated for not accessing any online community were no time to do so (14; 10%), and insufficient online information searching skills (9; 6%). Our findings suggest that online peer support is valuable, but practical strategies are needed to assist informal dementia caregivers who have limited time or searching skills. △ Less

Submitted 31 August, 2023; originally announced September 2023.

arXiv:2308.11237 [pdf, other]

Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation

Authors: Chao Ni, Xin Yin, Kaiwen Yang, Dehai Zhao, Zhenchang Xing, Xin Xia

Abstract: Though many deep learning (DL)-based vulnerability detection approaches have been proposed and indeed achieved remarkable performance, they still have limitations in the generalization as well as the practical usage. More precisely, existing DL-based approaches (1) perform negatively on prediction tasks among functions that are lexically similar but have contrary semantics; (2) provide no intuitiv… ▽ More Though many deep learning (DL)-based vulnerability detection approaches have been proposed and indeed achieved remarkable performance, they still have limitations in the generalization as well as the practical usage. More precisely, existing DL-based approaches (1) perform negatively on prediction tasks among functions that are lexically similar but have contrary semantics; (2) provide no intuitive developer-oriented explanations to the detected results. In this paper, we propose a novel approach named SVulD, a function-level Subtle semantic embedding for Vulnerability Detection along with intuitive explanations, to alleviate the above limitations. Specifically, SVulD firstly trains a model to learn distinguishing semantic representations of functions regardless of their lexical similarity. Then, for the detected vulnerable functions, SVulD provides natural language explanations (e.g., root cause) of results to help developers intuitively understand the vulnerabilities. To evaluate the effectiveness of SVulD, we conduct large-scale experiments on a widely used practical vulnerability dataset and compare it with four state-of-the-art (SOTA) approaches by considering five performance measures. The experimental results indicate that SVulD outperforms all SOTAs with a substantial improvement (i.e., 23.5%-68.0% in terms of F1-score, 15.9%-134.8% in terms of PR-AUC and 7.4%-64.4% in terms of Accuracy). Besides, we conduct a user-case study to evaluate the usefulness of SVulD for developers on understanding the vulnerable code and the participants' feedback demonstrates that SVulD is helpful for development practice. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: Accepted By FSE'23

arXiv:2307.07055 [pdf, other]

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement

Authors: Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Minshuo Chen, Mengdi Wang

Abstract: We explore the methodology and theory of reward-directed generation via conditional diffusion models. Directed generation aims to generate samples with desired properties as measured by a reward function, which has broad applications in generative AI, reinforcement learning, and computational biology. We consider the common learning scenario where the data set consists of unlabeled data along with… ▽ More We explore the methodology and theory of reward-directed generation via conditional diffusion models. Directed generation aims to generate samples with desired properties as measured by a reward function, which has broad applications in generative AI, reinforcement learning, and computational biology. We consider the common learning scenario where the data set consists of unlabeled data along with a smaller set of data with noisy reward labels. Our approach leverages a learned reward function on the smaller data set as a pseudolabeler. From a theoretical standpoint, we show that this directed generator can effectively learn and sample from the reward-conditioned data distribution. Additionally, our model is capable of recovering the latent subspace representation of data. Moreover, we establish that the model generates a new population that moves closer to a user-specified target reward value, where the optimality gap aligns with the off-policy bandit regret in the feature subspace. The improvement in rewards obtained is influenced by the interplay between the strength of the reward signal, the distribution shift, and the cost of off-support extrapolation. We provide empirical results to validate our theory and highlight the relationship between the strength of extrapolation and the quality of generated samples. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2305.12121 [pdf, other]

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

Authors: Jia Qi Yip, Tuan Truong, Dianwen Ng, Chong Zhang, Yukun Ma, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma

Abstract: In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric Cross Attention (ACA) to replace temporal pooling. ACA is able to distill large, variable-length sequences into small, fixed-sized latents by attending a small query to large key and value matrices. In ACA-Net, we buil… ▽ More In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric Cross Attention (ACA) to replace temporal pooling. ACA is able to distill large, variable-length sequences into small, fixed-sized latents by attending a small query to large key and value matrices. In ACA-Net, we build a Multi-Layer Aggregation (MLA) block using ACA to generate fixed-sized identity vectors from variable-length inputs. Through global attention, ACA-Net acts as an efficient global feature extractor that adapts to temporal variability unlike existing SV models that apply a fixed function for pooling over the temporal dimension which may obscure information about the signal's non-stationary temporal variability. Our experiments on the WSJ0-1talker show ACA-Net outperforms a strong baseline by 5\% relative improvement in EER using only 1/5 of the parameters. △ Less

Submitted 20 May, 2023; originally announced May 2023.

Comments: Accepted to INTERSPEECH 2023

arXiv:2305.01170 [pdf, other]

Contrastive Speech Mixup for Low-resource Keyword Spotting

Authors: Dianwen Ng, Ruixi Zhang, Jia Qi Yip, Chong Zhang, Yukun Ma, Trung Hieu Nguyen, Chongjia Ni, Eng Siong Chng, Bin Ma

Abstract: Most of the existing neural-based models for keyword spotting (KWS) in smart devices require thousands of training samples to learn a decent audio representation. However, with the rising demand for smart devices to become more personalized, KWS models need to adapt quickly to smaller user samples. To tackle this challenge, we propose a contrastive speech mixup (CosMix) learning algorithm for low-… ▽ More Most of the existing neural-based models for keyword spotting (KWS) in smart devices require thousands of training samples to learn a decent audio representation. However, with the rising demand for smart devices to become more personalized, KWS models need to adapt quickly to smaller user samples. To tackle this challenge, we propose a contrastive speech mixup (CosMix) learning algorithm for low-resource KWS. CosMix introduces an auxiliary contrastive loss to the existing mixup augmentation technique to maximize the relative similarity between the original pre-mixed samples and the augmented samples. The goal is to inject enhancing constraints to guide the model towards simpler but richer content-based speech representations from two augmented views (i.e. noisy mixed and clean pre-mixed utterances). We conduct our experiments on the Google Speech Command dataset, where we trim the size of the training set to as small as 2.5 mins per keyword to simulate a low-resource condition. Our experimental results show a consistent improvement in the performance of multiple models, which exhibits the effectiveness of our method. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: Accepted by ICASSP 2023

arXiv:2304.05297 [pdf, other]

Neural Network Approach to Portfolio Optimization with Leverage Constraints:a Case Study on High Inflation Investment

Authors: Chendi Ni, Yuying Li, Peter A. Forsyth

Abstract: Motivated by the current global high inflation scenario, we aim to discover a dynamic multi-period allocation strategy to optimally outperform a passive benchmark while adhering to a bounded leverage limit. To this end, we formulate an optimal control problem to outperform a benchmark portfolio throughout the investment horizon. Assuming the asset prices follow the jump-diffusion model during high… ▽ More Motivated by the current global high inflation scenario, we aim to discover a dynamic multi-period allocation strategy to optimally outperform a passive benchmark while adhering to a bounded leverage limit. To this end, we formulate an optimal control problem to outperform a benchmark portfolio throughout the investment horizon. Assuming the asset prices follow the jump-diffusion model during high inflation periods, we first establish a closed-form solution for the optimal strategy that outperforms a passive strategy under the cumulative quadratic tracking difference (CD) objective, assuming continuous trading and no bankruptcy. To obtain strategies under the bounded leverage constraint among other realistic constraints, we then propose a novel leverage-feasible neural network (LFNN) to represent control, which converts the original constrained optimization problem into an unconstrained optimization problem that is computationally feasible with standard optimization methods. We establish mathematically that the LFNN approximation can yield a solution that is arbitrarily close to the solution of the original optimal control problem with bounded leverage. We further apply the LFNN approach to a four-asset investment scenario with bootstrap resampled asset returns from the filtered high inflation regime data. The LFNN strategy is shown to consistently outperform the passive benchmark strategy by about 200 bps (median annualized return), with a greater than 90% probability of outperforming the benchmark at the end of the investment horizon. △ Less

Submitted 24 May, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

arXiv:2303.15124 [pdf, other]

Blind Inpainting with Object-aware Discrimination for Artificial Marker Removal

Authors: Xuechen Guo, Wenhao Hu, Chiming Ni, Wenhao Chai, Shiyan Li, Gaoang Wang

Abstract: Medical images often contain artificial markers added by doctors, which can negatively affect the accuracy of AI-based diagnosis. To address this issue and recover the missing visual contents, inpainting techniques are highly needed. However, existing inpainting methods require manual mask input, limiting their application scenarios. In this paper, we introduce a novel blind inpainting method that… ▽ More Medical images often contain artificial markers added by doctors, which can negatively affect the accuracy of AI-based diagnosis. To address this issue and recover the missing visual contents, inpainting techniques are highly needed. However, existing inpainting methods require manual mask input, limiting their application scenarios. In this paper, we introduce a novel blind inpainting method that automatically completes visual contents without specifying masks for target areas in an image. Our proposed model includes a mask-free reconstruction network and an object-aware discriminator. The reconstruction network consists of two branches that predict the corrupted regions with artificial markers and simultaneously recover the missing visual contents. The object-aware discriminator relies on the powerful recognition capabilities of the dense object detector to ensure that the markers of reconstructed images cannot be detected in any local regions. As a result, the reconstructed image can be close to the clean one as much as possible. Our proposed method is evaluated on different medical image datasets, covering multiple imaging modalities such as ultrasound (US), magnetic resonance imaging (MRI), and electron microscopy (EM), demonstrating that our method is effective and robust against various unknown missing region patterns. △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2303.07610 [pdf, other]

Exploring ChatGPT's Ability to Rank Content: A Preliminary Study on Consistency with Human Preferences

Authors: Yunjie Ji, Yan Gong, Yi** Peng, Chao Ni, Peiyan Sun, Dongyu Pan, Baochang Ma, Xiangang Li

Abstract: As a natural language assistant, ChatGPT is capable of performing various tasks, including but not limited to article generation, code completion, and data analysis. Furthermore, ChatGPT has consistently demonstrated a remarkable level of accuracy and reliability in terms of content evaluation, exhibiting the capability of mimicking human preferences. To further explore ChatGPT's potential in this… ▽ More As a natural language assistant, ChatGPT is capable of performing various tasks, including but not limited to article generation, code completion, and data analysis. Furthermore, ChatGPT has consistently demonstrated a remarkable level of accuracy and reliability in terms of content evaluation, exhibiting the capability of mimicking human preferences. To further explore ChatGPT's potential in this regard, a study is conducted to assess its ability to rank content. In order to do so, a test set consisting of prompts is created, covering a wide range of use cases, and five models are utilized to generate corresponding responses. ChatGPT is then instructed to rank the responses generated by these models. The results on the test set show that ChatGPT's ranking preferences are consistent with human to a certain extent. This preliminary experimental finding implies that ChatGPT's zero-shot ranking capability could be used to reduce annotation pressure in a number of ranking tasks. △ Less

Submitted 13 March, 2023; originally announced March 2023.

arXiv:2303.00179 [pdf, other]

A Unified Momentum-based Paradigm of Decentralized SGD for Non-Convex Models and Heterogeneous Data

Authors: Haizhou Du, Chengdong Ni

Abstract: Emerging distributed applications recently boosted the development of decentralized machine learning, especially in IoT and edge computing fields. In real-world scenarios, the common problems of non-convexity and data heterogeneity result in inefficiency, performance degradation, and development stagnation. The bulk of studies concentrates on one of the issues mentioned above without having a more… ▽ More Emerging distributed applications recently boosted the development of decentralized machine learning, especially in IoT and edge computing fields. In real-world scenarios, the common problems of non-convexity and data heterogeneity result in inefficiency, performance degradation, and development stagnation. The bulk of studies concentrates on one of the issues mentioned above without having a more general framework that has been proven optimal. To this end, we propose a unified paradigm called UMP, which comprises two algorithms, D-SUM and GT-DSUM, based on the momentum technique with decentralized stochastic gradient descent(SGD). The former provides a convergence guarantee for general non-convex objectives. At the same time, the latter is extended by introducing gradient tracking, which estimates the global optimization direction to mitigate data heterogeneity(i.e., distribution drift). We can cover most momentum-based variants based on the classical heavy ball or Nesterov's acceleration with different parameters in UMP. In theory, we rigorously provide the convergence analysis of these two approaches for non-convex objectives and conduct extensive experiments, demonstrating a significant improvement in model accuracy by up to 57.6% compared to other methods in practice. △ Less

Submitted 28 February, 2023; originally announced March 2023.

Comments: 24 pages

ACM Class: I.2.11; I.2.6

arXiv:2302.14597 [pdf, other]

deHuBERT: Disentangling Noise in a Self-supervised Model for Robust Speech Recognition

Authors: Dianwen Ng, Ruixi Zhang, Jia Qi Yip, Zhao Yang, **jie Ni, Chong Zhang, Yukun Ma, Chongjia Ni, Eng Siong Chng, Bin Ma

Abstract: Existing self-supervised pre-trained speech models have offered an effective way to leverage massive unannotated corpora to build good automatic speech recognition (ASR). However, many current models are trained on a clean corpus from a single source, which tends to do poorly when noise is present during testing. Nonetheless, it is crucial to overcome the adverse influence of noise for real-world… ▽ More Existing self-supervised pre-trained speech models have offered an effective way to leverage massive unannotated corpora to build good automatic speech recognition (ASR). However, many current models are trained on a clean corpus from a single source, which tends to do poorly when noise is present during testing. Nonetheless, it is crucial to overcome the adverse influence of noise for real-world applications. In this work, we propose a novel training framework, called deHuBERT, for noise reduction encoding inspired by H. Barlow's redundancy-reduction principle. The new framework improves the HuBERT training algorithm by introducing auxiliary losses that drive the self- and cross-correlation matrix between pairwise noise-distorted embeddings towards identity matrix. This encourages the model to produce noise-agnostic speech representations. With this method, we report improved robustness in noisy environments, including unseen noises, without impairing the performance on the clean set. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: Accepted by ICASSP 2023

arXiv:2210.16976 [pdf, other]

Representation Learning for General-sum Low-rank Markov Games

Authors: Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Chi **, Mengdi Wang

Abstract: We study multi-agent general-sum Markov games with nonlinear function approximation. We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation. The goal is to design an algorithm that (1) finds an $\varepsilon$-equilibrium policy sample efficiently without prior knowledge of the environment or the representation, and… ▽ More We study multi-agent general-sum Markov games with nonlinear function approximation. We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation. The goal is to design an algorithm that (1) finds an $\varepsilon$-equilibrium policy sample efficiently without prior knowledge of the environment or the representation, and (2) permits a deep-learning friendly implementation. We leverage representation learning and present a model-based and a model-free approach to construct an effective representation from the collected data. For both approaches, the algorithm achieves a sample complexity of poly$(H,d,A,1/\varepsilon)$, where $H$ is the game horizon, $d$ is the dimension of the feature vector, $A$ is the size of the joint action space and $\varepsilon$ is the optimality gap. When the number of players is large, the above sample complexity can scale exponentially with the number of players in the worst case. To address this challenge, we consider Markov games with a factorized transition structure and present an algorithm that escapes such exponential scaling. To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates (non-linear) function approximation. We accompany our theoretical result with a neural network-based implementation of our algorithm and evaluate it against the widely used deep RL baseline, DQN with fictitious play. △ Less

Submitted 30 October, 2022; originally announced October 2022.

arXiv:2210.03580 [pdf]

doi 10.1109/ICOT.2017.8336109

Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages

Authors: Lei Wang, Rong Tong, Cheung Chi Leung, Sunil Sivadas, Chongjia Ni, Bin Ma

Abstract: This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages. As not much existing work has been carried out on such regional languages, a few difficulties should be addressed before building the systems: limitation on speech and text resources, lack of linguistic knowledge, etc. This work takes Bahasa Indonesia and Thai as examples to… ▽ More This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages. As not much existing work has been carried out on such regional languages, a few difficulties should be addressed before building the systems: limitation on speech and text resources, lack of linguistic knowledge, etc. This work takes Bahasa Indonesia and Thai as examples to illustrate the strategies of collecting various resources required for building ASR systems. △ Less

Submitted 7 October, 2022; originally announced October 2022.

Comments: Published by the 2017 IEEE International Conference on Orange Technologies (ICOT 2017)

ACM Class: I.2.7

arXiv:2209.06360 [pdf, other]

I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization

Authors: Dianwen Ng, Jia Qi Yip, Tanmay Surana, Zhao Yang, Chong Zhang, Yukun Ma, Chongjia Ni, Eng Siong Chng, Bin Ma

Abstract: Noise robustness in keyword spotting remains a challenge as many models fail to overcome the heavy influence of noises, causing the deterioration of the quality of feature embeddings. We proposed a contrastive regularization method called Inter-Intra Contrastive Regularization (I2CR) to improve the feature representations by guiding the model to learn the fundamental speech information specific to… ▽ More Noise robustness in keyword spotting remains a challenge as many models fail to overcome the heavy influence of noises, causing the deterioration of the quality of feature embeddings. We proposed a contrastive regularization method called Inter-Intra Contrastive Regularization (I2CR) to improve the feature representations by guiding the model to learn the fundamental speech information specific to the cluster. This involves maximizing the similarity across Intra and Inter samples of the same class. As a result, it pulls the instances closer to more generalized representations that form more prominent clusters and reduces the adverse impact of noises. We show that our method provides consistent improvements in accuracy over different backbone model architectures under different noise environments. We also demonstrate that our proposed framework has improved the accuracy of unseen out-of-domain noises and unseen variant noise SNRs. This indicates the significance of our work with the overall refinement in noise robustness. △ Less

Submitted 13 September, 2022; originally announced September 2022.

arXiv:2206.02092 [pdf, other]

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

Authors: Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang

Abstract: Directed Evolution (DE), a landmark wet-lab method originated in 1960s, enables discovery of novel protein designs via evolving a population of candidate sequences. Recent advances in biotechnology has made it possible to collect high-throughput data, allowing the use of machine learning to map out a protein's sequence-to-function relation. There is a growing interest in machine learning-assisted… ▽ More Directed Evolution (DE), a landmark wet-lab method originated in 1960s, enables discovery of novel protein designs via evolving a population of candidate sequences. Recent advances in biotechnology has made it possible to collect high-throughput data, allowing the use of machine learning to map out a protein's sequence-to-function relation. There is a growing interest in machine learning-assisted DE for accelerating protein optimization. Yet the theoretical understanding of DE, as well as the use of machine learning in DE, remains limited. In this paper, we connect DE with the bandit learning theory and make a first attempt to study regret minimization in DE. We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function map** is unknown and querying a single value is subject to costly and noisy measurements. TS-DE updates a posterior of the function based on collected measurements. It uses a posterior-sampled function estimate to guide the crossover recombination and mutation steps in DE. In the case of a linear model, we show that TS-DE enjoys a Bayesian regret of order $\tilde O(d^{2}\sqrt{MT})$, where $d$ is feature dimension, $M$ is population size and $T$ is number of rounds. This regret bound is nearly optimal, confirming that bandit learning can provably accelerate DE. It may have implications for more general sequence optimization and evolutionary algorithms. △ Less

Submitted 4 June, 2022; originally announced June 2022.

arXiv:2205.03996 [pdf, other]

doi 10.1109/JETCAS.2022.3171522

Hardware-Robust In-RRAM-Computing for Object Detection

Authors: Yu-Hsiang Chiang, Cheng En Ni, Yun Sung, Tuo-Hung Hou, Tian-Sheuan Chang, Shyh Jye Jou

Abstract: In-memory computing is becoming a popular architecture for deep-learning hardware accelerators recently due to its highly parallel computing, low power, and low area cost. However, in-RRAM computing (IRC) suffered from large device variation and numerous nonideal effects in hardware. Although previous approaches including these effects in model training successfully improved variation tolerance, t… ▽ More In-memory computing is becoming a popular architecture for deep-learning hardware accelerators recently due to its highly parallel computing, low power, and low area cost. However, in-RRAM computing (IRC) suffered from large device variation and numerous nonideal effects in hardware. Although previous approaches including these effects in model training successfully improved variation tolerance, they only considered part of the nonideal effects and relatively simple classification tasks. This paper proposes a joint hardware and software optimization strategy to design a hardware-robust IRC macro for object detection. We lower the cell current by using a low word-line voltage to enable a complete convolution calculation in one operation that minimizes the impact of nonlinear addition. We also implement ternary weight map** and remove batch normalization for better tolerance against device variation, sense amplifier variation, and IR drop problem. An extra bias is included to overcome the limitation of the current sensing range. The proposed approach has been successfully applied to a complex object detection task with only 3.85\% mAP drop, whereas a naive design suffers catastrophic failure under these nonideal effects. △ Less

Submitted 8 May, 2022; originally announced May 2022.

Comments: 10 pages, 18 figures

arXiv:2204.04856 [pdf, other]

Defect Identification, Categorization, and Repair: Better Together

Authors: Chao Ni, Kaiwen Yang, Xin Xia, David Lo, Xiang Chen, Xiaohu Yang

Abstract: Just-In-Time defect prediction (JIT-DP) models can identify defect-inducing commits at check-in time. Even though previous studies have achieved a great progress, these studies still have the following limitations: 1) useful information (e.g., semantic information and structure information) are not fully used; 2) existing work can only predict a commit as buggy one or clean one without more inform… ▽ More Just-In-Time defect prediction (JIT-DP) models can identify defect-inducing commits at check-in time. Even though previous studies have achieved a great progress, these studies still have the following limitations: 1) useful information (e.g., semantic information and structure information) are not fully used; 2) existing work can only predict a commit as buggy one or clean one without more information about what type of defect it is; 3) a commit may involve changes in many files, which cause difficulty in locating the defect; 4) prior studies treat defect identification and defect repair as separate tasks, none aims to handle both tasks simultaneously. In this paper, to handle aforementioned limitations, we propose a comprehensive defect prediction and repair framework named CompDefect, which can identify whether a changed function (a more fine-grained level) is defect-prone, categorize the type of defect, and repair such a defect automatically if it falls into several scenarios, e.g., defects with single statement fixes, or those that match a small set of defect templates. Generally, the first two tasks in CompDefect are treated as a multiclass classification task, while the last one is treated as a sequence generation task. The whole input of CompDefect consists of three parts (exampled with positive functions): the clean version of a function (i.e., the version before defect introduced), the buggy version of a function and the fixed version of a function. In multiclass classification task, CompDefect categorizes the type of defect via multiclass classification with the information in both the clean version and the buggy version. In code sequence generation task, CompDefect repairs the defect once identified or keeps it unchanged. △ Less

Submitted 10 April, 2022; originally announced April 2022.

Comments: 22 pages, 4 figures

arXiv:2202.13715 [pdf, other]

doi 10.1109/LRA.2022.3186511

Fast and Compute-efficient Sampling-based Local Exploration Planning via Distribution Learning

Authors: Lukas Schmid, Chao Ni, Yuliang Zhong, Roland Siegwart, Olov Andersson

Abstract: Exploration is a fundamental problem in robotics. While sampling-based planners have shown high performance, they are oftentimes compute intensive and can exhibit high variance. To this end, we propose to directly learn the underlying distribution of informative views based on the spatial context in the robot's map. We further explore a variety of methods to also learn the information gain. We sho… ▽ More Exploration is a fundamental problem in robotics. While sampling-based planners have shown high performance, they are oftentimes compute intensive and can exhibit high variance. To this end, we propose to directly learn the underlying distribution of informative views based on the spatial context in the robot's map. We further explore a variety of methods to also learn the information gain. We show in thorough experimental evaluation that our proposed system improves exploration performance by up to 28% over classical methods, and find that learning the gains in addition to the sampling distribution can provide favorable performance vs. compute trade-offs for compute-constrained systems. We demonstrate in simulation and on a low-cost mobile robot that our system generalizes well to varying environments. △ Less

Submitted 22 June, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

Comments: Accepted for IEEE RA-L. Open-source code: https://github.com/ethz-asl/cvae_exploration_planning, 8 pages, 12 figures

Journal ref: IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7810-7817, July 2022

arXiv:2202.04970 [pdf, ps, other]

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

Authors: Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, Mengdi Wang

Abstract: Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially deep neural networks, has gained practical success. While statistical analysis has proved FQE to be minimax-optimal with tabular, linear and several nonparametric function families, its practical performance with more general functi… ▽ More Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially deep neural networks, has gained practical success. While statistical analysis has proved FQE to be minimax-optimal with tabular, linear and several nonparametric function families, its practical performance with more general function approximator is less theoretically understood. We focus on FQE with general differentiable function approximators, making our theory applicable to neural function approximations. We approach this problem using the Z-estimation theory and establish the following results: The FQE estimation error is asymptotically normal with explicit variance determined jointly by the tangent space of the function class at the ground truth, the reward structure, and the distribution shift due to off-policy learning; The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence, which measures how the off-policy distribution shift intertwines with the function approximator. In addition, we study bootstrap** FQE estimators for error distribution inference and estimating confidence intervals, accompanied by a Cramer-Rao lower bound that matches our upper bounds. The Z-estimation analysis provides a generalizable theoretical framework for studying off-policy estimation in RL and provides sharp statistical theory for FQE with differentiable function approximators. △ Less

Submitted 10 February, 2022; originally announced February 2022.

Comments: 39 pages

arXiv:2202.00076 [pdf, other]

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Authors: Chengzhuo Ni, Ruiqi Zhang, Xiang Ji, Xuezhou Zhang, Mengdi Wang

Abstract: Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy. Conventional methods for off-policy PG estimation often suffer from either significant bias or exponentially large variance. In this paper, we propose the double Fitted PG estimation (FPG) algorithm. FPG can work w… ▽ More Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy. Conventional methods for off-policy PG estimation often suffer from either significant bias or exponentially large variance. In this paper, we propose the double Fitted PG estimation (FPG) algorithm. FPG can work with an arbitrary policy parameterization, assuming access to a Bellman-complete value function class. In the case of linear value function approximation, we provide a tight finite-sample upper bound on policy gradient estimation error, that is governed by the amount of distribution mismatch measured in feature space. We also establish the asymptotic normality of FPG estimation error with a precise covariance characterization, which is further shown to be statistically optimal with a matching Cramer-Rao lower bound. Empirically, we evaluate the performance of FPG on both policy gradient estimation and policy optimization, using either softmax tabular or ReLU policy networks. Under various metrics, our results show that FPG significantly outperforms existing off-policy PG estimation methods based on importance sampling and variance reduction techniques. △ Less

Submitted 19 June, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

arXiv:2110.08545 [pdf, other]

A Unified Speaker Adaptation Approach for ASR

Authors: Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq Joty, Eng Siong Chng, Bin Ma

Abstract: Transformer models have been used in automatic speech recognition (ASR) successfully and yields state-of-the-art results. However, its performance is still affected by speaker mismatch between training and test data. Further finetuning a trained model with target speaker data is the most natural approach for adaptation, but it takes a lot of compute and may cause catastrophic forgetting to the exi… ▽ More Transformer models have been used in automatic speech recognition (ASR) successfully and yields state-of-the-art results. However, its performance is still affected by speaker mismatch between training and test data. Further finetuning a trained model with target speaker data is the most natural approach for adaptation, but it takes a lot of compute and may cause catastrophic forgetting to the existing speakers. In this work, we propose a unified speaker adaptation approach consisting of feature adaptation and model adaptation. For feature adaptation, we employ a speaker-aware persistent memory model which generalizes better to unseen test speakers by making use of speaker i-vectors to form a persistent memory. For model adaptation, we use a novel gradual pruning method to adapt to target speakers without changing the model architecture, which to the best of our knowledge, has never been explored in ASR. Specifically, we gradually prune less contributing parameters on model encoder to a certain sparsity level, and use the pruned parameters for adaptation, while freezing the unpruned parameters to keep the original model performance. We conduct experiments on the Librispeech dataset. Our proposed approach brings relative 2.74-6.52% word error rate (WER) reduction on general speaker adaptation. On target speaker adaptation, our method outperforms the baseline with up to 20.58% relative WER reduction, and surpasses the finetuning method by up to relative 2.54%. Besides, with extremely low-resource adaptation data (e.g., 1 utterance), our method could improve the WER by relative 6.53% with only a few epochs of training. △ Less

Submitted 16 October, 2021; originally announced October 2021.

Comments: Accepted by EMNLP 2021

arXiv:2105.01136 [pdf, other]

Learning Good State and Action Representations via Tensor Decomposition

Authors: Chengzhuo Ni, Yaqi Duan, Munther Dahleh, Anru Zhang, Mengdi Wang

Abstract: The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure. This paper proposes a tensor-inspired unsupervised learning method to identify meaningful low-dimensional state and action representations from empirical trajectories. The method exploits the MDP's tensor structure by kernelization, importance sampling and low-Tucker-rank approximati… ▽ More The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure. This paper proposes a tensor-inspired unsupervised learning method to identify meaningful low-dimensional state and action representations from empirical trajectories. The method exploits the MDP's tensor structure by kernelization, importance sampling and low-Tucker-rank approximation. This method can be further used to cluster states and actions respectively and find the best discrete MDP abstraction. We provide sharp statistical error bounds for tensor concentration and the preservation of diffusion distance after embedding. We further prove that the learned state/action abstractions provide accurate approximations to latent block structures if they exist, enabling function approximation in downstream tasks such as policy evaluation. △ Less

Submitted 19 February, 2023; v1 submitted 3 May, 2021; originally announced May 2021.

arXiv:2102.08607 [pdf, other]

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Authors: Junyu Zhang, Chengzhuo Ni, Zheng Yu, Csaba Szepesvari, Mengdi Wang

Abstract: Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to accelerate the existing PG methods such as REINFORCE by the \emph{variance reduction} techniques. However, all existing variance-reduced PG methods heavily rely on an uncheckable importance weight assumption made for every single iteration of the algorithms. In this… ▽ More Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to accelerate the existing PG methods such as REINFORCE by the \emph{variance reduction} techniques. However, all existing variance-reduced PG methods heavily rely on an uncheckable importance weight assumption made for every single iteration of the algorithms. In this paper, a simple gradient truncation mechanism is proposed to address this issue. Moreover, we design a Truncated Stochastic Incremental Variance-Reduced Policy Gradient (TSIVR-PG) method, which is able to maximize not only a cumulative sum of rewards but also a general utility function over a policy's long-term visiting distribution. We show an $\tilde{\mathcal{O}}(ε^{-3})$ sample complexity for TSIVR-PG to find an $ε$-stationary policy. By assuming the overparameterizaiton of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global $ε$-optimal policy with $\tilde{\mathcal{O}}(ε^{-2})$ samples. △ Less

Submitted 27 May, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

arXiv:2005.10407 [pdf, other]

Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

Authors: Zhi** Zeng, Van Tung Pham, Haihua Xu, Yerbolat Khassanov, Eng Siong Chng, Chongjia Ni, Bin Ma

Abstract: In this work, we study leveraging extra text data to improve low-resource end-to-end ASR under cross-lingual transfer learning setting. To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture. This architecture not only takes advantage of the highly effective encoding capacity of the Transformer network but also benefits from extra text data due to the L… ▽ More In this work, we study leveraging extra text data to improve low-resource end-to-end ASR under cross-lingual transfer learning setting. To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture. This architecture not only takes advantage of the highly effective encoding capacity of the Transformer network but also benefits from extra text data due to the LSTM-based independent language model network. We conduct experiments on our in-house Malay corpus which contains limited labeled data and a large amount of extra text. Results show that the proposed architecture outperforms the previous LSTM-based architecture [1] by 24.2% relative word error rate (WER) when both are trained using limited labeled data. Starting from this, we obtain further 25.4% relative WER reduction by transfer learning from another resource-rich language. Moreover, we obtain additional 13.6% relative WER reduction by boosting the LSTM decoder of the transferred model with the extra text data. Overall, our best model outperforms the vanilla Transformer ASR by 11.9% relative WER. Last but not least, the proposed hybrid architecture offers much faster inference compared to both LSTM and Transformer architectures. △ Less

Submitted 28 May, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

arXiv:2004.06842 [pdf, other]

doi 10.1145/3366424.3383570

Layered Graph Embedding for Entity Recommendation using Wikipedia in the Yahoo! Knowledge Graph

Authors: Chien-Chun Ni, Kin Sum Liu, Nicolas Torzec

Abstract: In this paper, we describe an embedding-based entity recommendation framework for Wikipedia that organizes Wikipedia into a collection of graphs layered on top of each other, learns complementary entity representations from their topology and content, and combines them with a lightweight learning-to-rank approach to recommend related entities on Wikipedia. Through offline and online evaluations, w… ▽ More In this paper, we describe an embedding-based entity recommendation framework for Wikipedia that organizes Wikipedia into a collection of graphs layered on top of each other, learns complementary entity representations from their topology and content, and combines them with a lightweight learning-to-rank approach to recommend related entities on Wikipedia. Through offline and online evaluations, we show that the resulting embeddings and recommendations perform well in terms of quality and user engagement. Balancing simplicity and quality, this framework provides default entity recommendations for English and other languages in the Yahoo! Knowledge Graph, which Wikipedia is a core subset of. △ Less

Submitted 14 April, 2020; originally announced April 2020.

Comments: 8 pages, 4 figures, 8 tables. To be appeared in Wiki Workshop 2020, Companion Proceedings of the Web Conference 2020(WWW 20 Companion), Taipei, Taiwan

ACM Class: H.3.3

arXiv:1912.00863 [pdf, other]

Independent language modeling architecture for end-to-end ASR

Authors: Van Tung Pham, Haihua Xu, Yerbolat Khassanov, Zhi** Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma, Haizhou Li

Abstract: The attention-based end-to-end (E2E) automatic speech recognition (ASR) architecture allows for joint optimization of acoustic and language models within a single network. However, in a vanilla E2E ASR architecture, the decoder sub-network (subnet), which incorporates the role of the language model (LM), is conditioned on the encoder output. This means that the acoustic encoder and the language mo… ▽ More The attention-based end-to-end (E2E) automatic speech recognition (ASR) architecture allows for joint optimization of acoustic and language models within a single network. However, in a vanilla E2E ASR architecture, the decoder sub-network (subnet), which incorporates the role of the language model (LM), is conditioned on the encoder output. This means that the acoustic encoder and the language model are entangled that doesn't allow language model to be trained separately from external text data. To address this problem, in this work, we propose a new architecture that separates the decoder subnet from the encoder output. In this way, the decoupled subnet becomes an independently trainable LM subnet, which can easily be updated using the external text data. We study two strategies for updating the new architecture. Experimental results show that, 1) the independent LM architecture benefits from external text data, achieving 9.3% and 22.8% relative character and word error rate reduction on Mandarin HKUST and English NSC datasets respectively; 2)the proposed architecture works well with external LM and can be generalized to different amount of labelled data. △ Less

Submitted 25 November, 2019; originally announced December 2019.

arXiv:1908.06560 [pdf, other]

doi 10.1016/j.infsof.2020.106441

Revisiting Heterogeneous Defect Prediction: How Far Are We?

Authors: Xiang Chen, Yanzhou Mu, Chao Ni, Zhanqi Cui

Abstract: Until now, researchers have proposed several novel heterogeneous defect prediction HDP methods with promising performance. To the best of our knowledge, whether HDP methods can perform significantly better than unsupervised methods has not yet been thoroughly investigated. In this article, we perform a replication study to have a holistic look in this issue. In particular, we compare state-of-the-… ▽ More Until now, researchers have proposed several novel heterogeneous defect prediction HDP methods with promising performance. To the best of our knowledge, whether HDP methods can perform significantly better than unsupervised methods has not yet been thoroughly investigated. In this article, we perform a replication study to have a holistic look in this issue. In particular, we compare state-of-the-art five HDP methods with five unsupervised methods. Final results surprisingly show that these HDP methods do not perform significantly better than some of unsupervised methods (especially the simple unsupervised methods proposed by Zhou et al.) in terms of two non-effort-aware performance measures and four effort-aware performance measures. Then, we perform diversity analysis on defective modules via McNemar's test and find the prediction diversity is more obvious when the comparison is performed between the HDP methods and the unsupervised methods than the comparisons only between the HDP methods or between the unsupervised methods. This shows the HDP methods and the unsupervised methods are complementary to each other in identifying defective models to some extent. Finally, we investigate the feasibility of five HDP methods by considering two satisfactory criteria recommended by previous CPDP studies and find the satisfactory ratio of these HDP methods is still pessimistic. The above empirical results implicate there is still a long way for heterogeneous defect prediction to go. More effective HDP methods need to be designed and the unsupervised methods should be considered as baselines. △ Less

Submitted 18 August, 2019; originally announced August 2019.

Comments: 40 pages, 13 figures

Journal ref: Information and Software Technology, 2021, 130: 106441

arXiv:1907.07129 [pdf, other]

Topology Based Scalable Graph Kernels

Authors: Kin Sum Liu, Chien-Chun Ni, Yu-Yao Lin, Jie Gao

Abstract: We propose a new graph kernel for graph classification and comparison using Ollivier Ricci curvature. The Ricci curvature of an edge in a graph describes the connectivity in the local neighborhood. An edge in a densely connected neighborhood has positive curvature and an edge serving as a local bridge has negative curvature. We use the edge curvature distribution to form a graph kernel which is th… ▽ More We propose a new graph kernel for graph classification and comparison using Ollivier Ricci curvature. The Ricci curvature of an edge in a graph describes the connectivity in the local neighborhood. An edge in a densely connected neighborhood has positive curvature and an edge serving as a local bridge has negative curvature. We use the edge curvature distribution to form a graph kernel which is then used to compare and cluster graphs. The curvature kernel uses purely the graph topology and thereby works for settings when node attributes are not available. △ Less

Submitted 14 July, 2019; originally announced July 2019.

arXiv:1907.03993 [pdf, other]

Community Detection on Networks with Ricci Flow

Authors: Chien-Chun Ni, Yu-Yao Lin, Feng Luo, Jie Gao

Abstract: Many complex networks in the real world have community structures -- groups of well-connected nodes with important functional roles. It has been well recognized that the identification of communities bears numerous practical applications. While existing approaches mainly apply statistical or graph theoretical/combinatorial methods for community detection, in this paper, we present a novel geometri… ▽ More Many complex networks in the real world have community structures -- groups of well-connected nodes with important functional roles. It has been well recognized that the identification of communities bears numerous practical applications. While existing approaches mainly apply statistical or graph theoretical/combinatorial methods for community detection, in this paper, we present a novel geometric approach which enables us to borrow powerful classical geometric methods and properties. By considering networks as geometric objects and communities in a network as a geometric decomposition, we apply curvature and discrete Ricci flow, which have been used to decompose smooth manifolds with astonishing successes in mathematics, to break down communities in networks. We tested our method on networks with ground-truth community structures, and experimentally confirmed the effectiveness of this geometric approach. △ Less

Submitted 9 July, 2019; originally announced July 2019.

Comments: 29 pages, 18 figures, to be appeared on Scientific Reports

arXiv:1905.01576 [pdf, ps, other]

Learning to Control in Metric Space with Optimal Regret

Authors: Lin F. Yang, Chengzhuo Ni, Mengdi Wang

Abstract: We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learni… ▽ More We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences. We show that the regret of the algorithm after $K$ episodes is $O(HL(KH)^{\frac{d-1}{d}}) $ where $L$ is a smoothness parameter, and $d$ is the doubling dimension of the state-action space with respect to the given metric. We also establish a near-matching regret lower bound. The proposed method can be adapted to work for more structured transition systems, including the finite-state case and the case where value functions are linear combinations of features, where the method also achieve the optimal regret. △ Less

Submitted 4 May, 2019; originally announced May 2019.

arXiv:1904.03802 [pdf, other]

doi 10.21437/Interspeech.2019-1867

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

Authors: Yerbolat Khassanov, Haihua Xu, Van Tung Pham, Zhi** Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma

Abstract: The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes t… ▽ More The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes the ASR model to easily code-switch between languages. Specifically, we propose to use Jensen-Shannon divergence and cosine distance based constraints. The former will enforce output embeddings of monolingual languages to possess similar distributions, while the later simply brings the centroids of two distributions to be close to each other. Experimental results demonstrate high effectiveness of the proposed method, yielding up to 4.5% absolute mixed error rate improvement on Mandarin-English code-switching ASR task. △ Less

Submitted 31 July, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

Comments: 5 pages, 3 figures, accepted to INTERSPEECH 2019

arXiv:1901.10655 [pdf, other]

On the Calibration of Multiclass Classification with Rejection

Authors: Chenri Ni, Nontawat Charoenphakdee, Junya Honda, Masashi Sugiyama

Abstract: We investigate the problem of multiclass classification with rejection, where a classifier can choose not to make a prediction to avoid critical misclassification. First, we consider an approach based on simultaneous training of a classifier and a rejector, which achieves the state-of-the-art performance in the binary case. We analyze this approach for the multiclass case and derive a general cond… ▽ More We investigate the problem of multiclass classification with rejection, where a classifier can choose not to make a prediction to avoid critical misclassification. First, we consider an approach based on simultaneous training of a classifier and a rejector, which achieves the state-of-the-art performance in the binary case. We analyze this approach for the multiclass case and derive a general condition for calibration to the Bayes-optimal solution, which suggests that calibration is hard to achieve by general loss functions unlike the binary case. Next, we consider another traditional approach based on confidence scores, in which the existing work focuses on a specific class of losses. We propose rejection criteria for more general losses for this approach and guarantee calibration to the Bayes-optimal solution. Finally, we conduct experiments to validate the relevance of our theoretical findings. △ Less

Submitted 29 October, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

Comments: NeurIPS2019 camera-ready, 31 pages

arXiv:1809.00320 [pdf, other]

Network Alignment by Discrete Ollivier-Ricci Flow

Authors: Chien-Chun Ni, Yu-Yao Lin, Jie Gao, Xianfeng David Gu

Abstract: In this paper, we consider the problem of approximately aligning/matching two graphs. Given two graphs $G_{1}=(V_{1},E_{1})$ and $G_{2}=(V_{2},E_{2})$, the objective is to map nodes $u, v \in G_1$ to nodes $u',v'\in G_2$ such that when $u, v$ have an edge in $G_1$, very likely their corresponding nodes $u', v'$ in $G_2$ are connected as well. This problem with subgraph isomorphism as a special cas… ▽ More In this paper, we consider the problem of approximately aligning/matching two graphs. Given two graphs $G_{1}=(V_{1},E_{1})$ and $G_{2}=(V_{2},E_{2})$, the objective is to map nodes $u, v \in G_1$ to nodes $u',v'\in G_2$ such that when $u, v$ have an edge in $G_1$, very likely their corresponding nodes $u', v'$ in $G_2$ are connected as well. This problem with subgraph isomorphism as a special case has extra challenges when we consider matching complex networks exhibiting the small world phenomena. In this work, we propose to use `Ricci flow metric', to define the distance between two nodes in a network. This is then used to define similarity of a pair of nodes in two networks respectively, which is the crucial step of network alignment. %computed by discrete graph curvatures and graph Ricci flows. Specifically, the Ricci curvature of an edge describes intuitively how well the local neighborhood is connected. The graph Ricci flow uniformizes discrete Ricci curvature and induces a Ricci flow metric that is insensitive to node/edge insertions and deletions. With the new metric, we can map a node in $G_1$ to a node in $G_2$ whose distance vector to only a few preselected landmarks is the most similar. The robustness of the graph metric makes it outperform other methods when tested on various complex graph models and real world network data sets (Emails, Internet, and protein interaction networks)\footnote{The source code of computing Ricci curvature and Ricci flow metric are available: https://github.com/saibalmars/GraphRicciCurvature}. △ Less

Submitted 7 September, 2018; v1 submitted 2 September, 2018; originally announced September 2018.

Comments: Appears in the Proceedings of the 26th International Symposium on Graph Drawing and Network Visualization (GD 2018)

arXiv:1708.09129 [pdf, other]

Decentralized Trajectory Tracking Using Homology and Hodge Decomposition in Sensor Networks

Authors: Xiaotian Yin, Yu-Yao Lin, Chien-Chun Ni, Jiaxin Ding, Wei Han, Dengpan Zhou, Jie Gao, Xianfeng Gu

Abstract: With the recent development of localization and tracking systems for both indoor and outdoor settings, we consider the problem of sensing, representing and analyzing human movement trajectories that we expect to gather in the near future. In this paper, we propose to use the topological representation, which records how a target moves around the natural obstacles in the underlying environment. We… ▽ More With the recent development of localization and tracking systems for both indoor and outdoor settings, we consider the problem of sensing, representing and analyzing human movement trajectories that we expect to gather in the near future. In this paper, we propose to use the topological representation, which records how a target moves around the natural obstacles in the underlying environment. We demonstrate that the topological information can be sufficiently descriptive for many applications and efficient enough for storing, comparing and classifying these natural human trajectories. We pre-process the sensor network with a purely decentralized algorithm such that certain edges are given numerical weights. Then we can perform trajectory classification by simply summing up the edge weights along the trajectory. Our method supports real-time classification of trajectories with minimum communication cost. We test the effectiveness of our approach by showing how to classify randomly generated trajectories in a multi-level arts museum layout as well as how to distinguish real world taxi trajectories in a large city. △ Less

Submitted 30 August, 2017; originally announced August 2017.

Comments: 30 pages, 10 figures, submitted to ACM TSAS

arXiv:1708.04813 [pdf, ps, other]

Energy-Efficient Resource Allocation for Cache-Assisted Mobile Edge Computing

Authors: Ying Cui, Wen He, Chun Ni, Chengjun Guo, Zhi Liu

Abstract: In this paper, we jointly consider communication, caching and computation in a multi-user cache-assisted mobile edge computing (MEC) system, consisting of one base station (BS) of caching and computing capabilities and multiple users with computation-intensive and latency-sensitive applications. We propose a joint caching and offloading mechanism which involves task uploading and executing for tas… ▽ More In this paper, we jointly consider communication, caching and computation in a multi-user cache-assisted mobile edge computing (MEC) system, consisting of one base station (BS) of caching and computing capabilities and multiple users with computation-intensive and latency-sensitive applications. We propose a joint caching and offloading mechanism which involves task uploading and executing for tasks with uncached computation results as well as computation result downloading for all tasks at the BS, and efficiently utilizes multi-user diversity and multicasting opportunities. Then, we formulate the average total energy minimization problem subject to the caching and deadline constraints to optimally allocate the storage resource at the BS for caching computation results as well as the uploading and downloading time durations. The problem is a challenging mixed discrete-continuous optimization problem. We show that strong duality holds, and obtain an optimal solution using a dual method. To reduce the computational complexity, we further propose a low-complexity suboptimal solution. Finally, numerical results show that the proposed suboptimal solution outperforms existing comparison schemes. △ Less

Submitted 16 August, 2017; originally announced August 2017.

Comments: 9 pages, 8 figures, to appear in IEEE LCN 2017, Oct 9-12

arXiv:1701.07549 [pdf, other]

Robot Coverage Path Planning for General Surfaces Using Quadratic Differentials

Authors: Yu-Yao Lin, Chien-Chun Ni, Na Lei, Xianfeng David Gu, Jie Gao

Abstract: Robot Coverage Path planning (i.e., provide full coverage of a given domain by one or multiple robots) is a classical problem in the field of robotics and motion planning. The goal is to provide nearly full coverage while also minimize duplicately visited area. In this paper we focus on the scenario of path planning on general surfaces including planar domains with complex topology, complex terrai… ▽ More Robot Coverage Path planning (i.e., provide full coverage of a given domain by one or multiple robots) is a classical problem in the field of robotics and motion planning. The goal is to provide nearly full coverage while also minimize duplicately visited area. In this paper we focus on the scenario of path planning on general surfaces including planar domains with complex topology, complex terrain or general surface in 3D space. The main idea is to adopt a natural, intrinsic and global parametrization of the surface for robot path planning, namely the holomorphic quadratic differentials. Except for a small number of zero points (singularities), each point on the surface is given a uv-coordinates naturally represented by a complex number. We show that natural, efficient robot paths can be obtained by using such coordinate systems. The method is based on intrinsic geometry and thus can be adapted to general surface exploration in 3D. △ Less

Submitted 25 January, 2017; originally announced January 2017.

Comments: 8 pages, 13 figures, IEEE ICRA 2017

Showing 1–50 of 54 results for author: Ni, C