Search | arXiv e-print repository

Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection

Authors: Jie Feng, Xiaojian Zhong, Di Li, Weisheng Dong, Ronghua Shang, Licheng Jiao

Abstract: Band selection plays a crucial role in hyperspectral image classification by removing redundant and noisy bands and retaining discriminative ones. However, most existing deep learning-based methods are aimed at dealing with a specific band selection dataset, and need to retrain parameters for new datasets, which significantly limits their generalizability.To address this issue, a novel multi-teach… ▽ More Band selection plays a crucial role in hyperspectral image classification by removing redundant and noisy bands and retaining discriminative ones. However, most existing deep learning-based methods are aimed at dealing with a specific band selection dataset, and need to retrain parameters for new datasets, which significantly limits their generalizability.To address this issue, a novel multi-teacher multi-objective meta-learning network (M$^3$BS) is proposed for zero-shot hyperspectral band selection. In M$^3$BS, a generalizable graph convolution network (GCN) is constructed to generate dataset-agnostic base, and extract compatible meta-knowledge from multiple band selection tasks. To enhance the ability of meta-knowledge extraction, multiple band selection teachers are introduced to provide diverse high-quality experiences.strategy Finally, subsequent classification tasks are attached and jointly optimized with multi-teacher band selection tasks through multi-objective meta-learning in an end-to-end trainable way. Multi-objective meta-learning guarantees to coordinate diverse optimization objectives automatically and adapt to various datasets simultaneously. Once the optimization is accomplished, the acquired meta-knowledge can be directly transferred to unseen datasets without any retraining or fine-tuning. Experimental results demonstrate the effectiveness and efficiency of our proposed method on par with state-of-the-art baselines for zero-shot hyperspectral band selection. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2402.07158 [pdf, other]

Effort and Size Estimation in Software Projects with Large Language Model-based Intelligent Interfaces

Authors: Claudionor N. Coelho Jr, Hanchen Xiong, Tushar Karayil, Sree Koratala, Rex Shang, Jacob Bollinger, Mohamed Shabar, Syam Nair

Abstract: The advancement of Large Language Models (LLM) has also resulted in an equivalent proliferation in its applications. Software design, being one, has gained tremendous benefits in using LLMs as an interface component that extends fixed user stories. However, inclusion of LLM-based AI agents in software design often poses unexpected challenges, especially in the estimation of development efforts. Th… ▽ More The advancement of Large Language Models (LLM) has also resulted in an equivalent proliferation in its applications. Software design, being one, has gained tremendous benefits in using LLMs as an interface component that extends fixed user stories. However, inclusion of LLM-based AI agents in software design often poses unexpected challenges, especially in the estimation of development efforts. Through the example of UI-based user stories, we provide a comparison against traditional methods and propose a new way to enhance specifications of natural language-based questions that allows for the estimation of development effort by taking into account data sources, interfaces and algorithms. △ Less

Submitted 28 June, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

arXiv:2309.10947 [pdf, other]

How Do Analysts Understand and Verify AI-Assisted Data Analyses?

Authors: Ken Gu, Ruoxi Shang, Tim Althoff, Chenglong Wang, Steven M. Drucker

Abstract: Data analysis is challenging as it requires synthesizing domain knowledge, statistical expertise, and programming skills. Assistants powered by large language models (LLMs), such as ChatGPT, can assist analysts by translating natural language instructions into code. However, AI-assistant responses and analysis code can be misaligned with the analyst's intent or be seemingly correct but lead to inc… ▽ More Data analysis is challenging as it requires synthesizing domain knowledge, statistical expertise, and programming skills. Assistants powered by large language models (LLMs), such as ChatGPT, can assist analysts by translating natural language instructions into code. However, AI-assistant responses and analysis code can be misaligned with the analyst's intent or be seemingly correct but lead to incorrect conclusions. Therefore, validating AI assistance is crucial and challenging. Here, we explore how analysts understand and verify the correctness of AI-generated analyses. To observe analysts in diverse verification approaches, we develop a design probe equipped with natural language explanations, code, visualizations, and interactive data tables with common data operations. Through a qualitative user study (n=22) using this probe, we uncover common behaviors within verification workflows and how analysts' programming, analysis, and tool backgrounds reflect these behaviors. Additionally, we provide recommendations for analysts and highlight opportunities for designers to improve future AI-assistant experiences. △ Less

Submitted 4 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: Accepted to CHI 2024

arXiv:2306.13837 [pdf]

DEKGCI: A double-sided recommendation model for integrating knowledge graph and user-item interaction graph

Authors: Ya**g Yang, Zeyu Zeng, Mao Chen, Ruirui Shang

Abstract: Both knowledge graphs and user-item interaction graphs are frequently used in recommender systems due to their ability to provide rich information for modeling users and items. However, existing studies often focused on one of these sources (either the knowledge graph or the user-item interaction graph), resulting in underutilization of the benefits that can be obtained by integrating both sources… ▽ More Both knowledge graphs and user-item interaction graphs are frequently used in recommender systems due to their ability to provide rich information for modeling users and items. However, existing studies often focused on one of these sources (either the knowledge graph or the user-item interaction graph), resulting in underutilization of the benefits that can be obtained by integrating both sources of information. In this paper, we propose DEKGCI, a novel double-sided recommendation model. In DEKGCI, we use the high-order collaborative signals from the user-item interaction graph to enrich the user representations on the user side. Additionally, we utilize the high-order structural and semantic information from the knowledge graph to enrich the item representations on the item side. DEKGCI simultaneously learns the user and item representations to effectively capture the joint interactions between users and items. Three real-world datasets are adopted in the experiments to evaluate DEKGCI's performance, and experimental results demonstrate its high effectiveness compared to seven state-of-the-art baselines in terms of AUC and ACC. △ Less

Submitted 23 June, 2023; originally announced June 2023.

Comments: 24 pages, 6 figures,6 tables

arXiv:2305.05233 [pdf, other]

DynamicKD: An Effective Knowledge Distillation via Dynamic Entropy Correction-Based Distillation for Gap Optimizing

Authors: Songling Zhu, Ronghua Shang, Bo Yuan, Weitong Zhang, Yangyang Li, Licheng Jiao

Abstract: The knowledge distillation uses a high-performance teacher network to guide the student network. However, the performance gap between the teacher and student networks can affect the student's training. This paper proposes a novel knowledge distillation algorithm based on dynamic entropy correction to reduce the gap by adjusting the student instead of the teacher. Firstly, the effect of changing th… ▽ More The knowledge distillation uses a high-performance teacher network to guide the student network. However, the performance gap between the teacher and student networks can affect the student's training. This paper proposes a novel knowledge distillation algorithm based on dynamic entropy correction to reduce the gap by adjusting the student instead of the teacher. Firstly, the effect of changing the output entropy (short for output information entropy) in the student on the distillation loss is analyzed in theory. This paper shows that correcting the output entropy can reduce the gap. Then, a knowledge distillation algorithm based on dynamic entropy correction is created, which can correct the output entropy in real-time with an entropy controller updated dynamically by the distillation loss. The proposed algorithm is validated on the CIFAR100 and ImageNet. The comparison with various state-of-the-art distillation algorithms shows impressive results, especially in the experiment on the CIFAR100 regarding teacher-student pair resnet32x4-resnet8x4. The proposed algorithm raises 2.64 points over the traditional distillation algorithm and 0.87 points over the state-of-the-art algorithm CRD in classification accuracy, demonstrating its effectiveness and efficiency. △ Less

Submitted 9 May, 2023; originally announced May 2023.

arXiv:2303.16212 [pdf, other]

A Multi-objective Complex Network Pruning Framework Based on Divide-and-conquer and Global Performance Impairment Ranking

Authors: Ronghua Shang, Songling Zhu, Yinan Wu, Weitong Zhang, Licheng Jiao, Songhua Xu

Abstract: Model compression plays a vital role in the practical deployment of deep neural networks (DNNs), and evolutionary multi-objective (EMO) pruning is an essential tool in balancing the compression rate and performance of the DNNs. However, due to its population-based nature, EMO pruning suffers from the complex optimization space and the resource-intensive structure verification process, especially i… ▽ More Model compression plays a vital role in the practical deployment of deep neural networks (DNNs), and evolutionary multi-objective (EMO) pruning is an essential tool in balancing the compression rate and performance of the DNNs. However, due to its population-based nature, EMO pruning suffers from the complex optimization space and the resource-intensive structure verification process, especially in complex networks. To this end, a multi-objective complex network pruning framework based on divide-and-conquer and global performance impairment ranking (EMO-DIR) is proposed in this paper. Firstly, a divide-and-conquer EMO network pruning method is proposed, which decomposes the complex task of EMO pruning on the entire network into easier sub-tasks on multiple sub-networks. On the one hand, this decomposition narrows the pruning optimization space and decreases the optimization difficulty; on the other hand, the smaller network structure converges faster, so the proposed algorithm consumes lower computational resources. Secondly, a sub-network training method based on cross-network constraints is designed, which could bridge independent EMO pruning sub-tasks, allowing them to collaborate better and improving the overall performance of the pruned network. Finally, a multiple sub-networks joint pruning method based on EMO is proposed. This method combines the Pareto Fronts from EMO pruning results on multiple sub-networks through global performance impairment ranking to design a joint pruning scheme. The rich experiments on CIFAR-10/100 and ImageNet-100/1k are conducted. The proposed algorithm achieves a comparable performance with the state-of-the-art pruning methods. △ Less

Submitted 30 December, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

arXiv:2303.03707 [pdf]

Hybrid quantum-classical convolutional neural network for phytoplankton classification

Authors: Shangshang Shi, Zhimin Wang, Ruimin Shang, Yanan Li, Jiaxin Li, Guoqiang Zhong, Yongjian Gu

Abstract: The taxonomic composition and abundance of phytoplankton, having direct impact on marine ecosystem dynamic and global environment change, are listed as essential ocean variables. Phytoplankton classification is very crucial for Phytoplankton analysis, but it is very difficult because of the huge amount and tiny volume of Phytoplankton. Machine learning is the principle way of performing phytoplank… ▽ More The taxonomic composition and abundance of phytoplankton, having direct impact on marine ecosystem dynamic and global environment change, are listed as essential ocean variables. Phytoplankton classification is very crucial for Phytoplankton analysis, but it is very difficult because of the huge amount and tiny volume of Phytoplankton. Machine learning is the principle way of performing phytoplankton image classification automatically. When carrying out large-scale research on the marine phytoplankton, the volume of data increases overwhelmingly and more powerful computational resources are required for the success of machine learning algorithms. Recently, quantum machine learning has emerged as the potential solution for large-scale data processing by harnessing the exponentially computational power of quantum computer. Here, for the first time, we demonstrate the feasibility of quantum deep neural networks for phytoplankton classification. Hybrid quantum-classical convolutional and residual neural networks are developed based on the classical architectures. These models make a proper balance between the limited function of the current quantum devices and the large size of phytoplankton images, which make it possible to perform phytoplankton classification on the near-term quantum computers. Better performance is obtained by the quantum-enhanced models against the classical counterparts. In particular, quantum models converge much faster than classical ones. The present quantum models are versatile, and can be applied for various tasks of image classification in the field of marine science. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: 20 pages, 13 figures

arXiv:2302.03244 [pdf, other]

Quantum Recurrent Neural Networks for Sequential Learning

Authors: Yanan Li, Zhimin Wang, Rongbing Han, Shangshang Shi, Jiaxin Li, Ruimin Shang, Haiyong Zheng, Guoqiang Zhong, Yongjian Gu

Abstract: Quantum neural network (QNN) is one of the promising directions where the near-term noisy intermediate-scale quantum (NISQ) devices could find advantageous applications against classical resources. Recurrent neural networks are the most fundamental networks for sequential learning, but up to now there is still a lack of canonical model of quantum recurrent neural network (QRNN), which certainly re… ▽ More Quantum neural network (QNN) is one of the promising directions where the near-term noisy intermediate-scale quantum (NISQ) devices could find advantageous applications against classical resources. Recurrent neural networks are the most fundamental networks for sequential learning, but up to now there is still a lack of canonical model of quantum recurrent neural network (QRNN), which certainly restricts the research in the field of quantum deep learning. In the present work, we propose a new kind of QRNN which would be a good candidate as the canonical QRNN model, where, the quantum recurrent blocks (QRBs) are constructed in the hardware-efficient way, and the QRNN is built by stacking the QRBs in a staggered way that can greatly reduce the algorithm's requirement with regard to the coherent time of quantum devices. That is, our QRNN is much more accessible on NISQ devices. Furthermore, the performance of the present QRNN model is verified concretely using three different kinds of classical sequential data, i.e., meteorological indicators, stock price, and text categorization. The numerical experiments show that our QRNN achieves much better performance in prediction (classification) accuracy against the classical RNN and state-of-the-art QNN models for sequential learning, and can predict the changing details of temporal sequence data. The practical circuit structure and superior performance indicate that the present QRNN is a promising learning model to find quantum advantageous applications in the near term. △ Less

Submitted 6 February, 2023; originally announced February 2023.

arXiv:2211.01713 [pdf, other]

iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud

Authors: Fei Xu, Jianian Xu, Jiabin Chen, Li Chen, Ruitao Shang, Zhi Zhou, Fangming Liu

Abstract: GPUs are essential to accelerating the latency-sensitive deep neural network (DNN) inference workloads in cloud datacenters. To fully utilize GPU resources, spatial sharing of GPUs among co-located DNN inference workloads becomes increasingly compelling. However, GPU sharing inevitably brings severe performance interference among co-located inference workloads, as motivated by an empirical measure… ▽ More GPUs are essential to accelerating the latency-sensitive deep neural network (DNN) inference workloads in cloud datacenters. To fully utilize GPU resources, spatial sharing of GPUs among co-located DNN inference workloads becomes increasingly compelling. However, GPU sharing inevitably brings severe performance interference among co-located inference workloads, as motivated by an empirical measurement study of DNN inference on EC2 GPU instances. While existing works on guaranteeing inference performance service level objectives (SLOs) focus on either temporal sharing of GPUs or reactive GPU resource scaling and inference migration techniques, how to proactively mitigate such severe performance interference has received comparatively little attention. In this paper, we propose iGniter, an interference-aware GPU resource provisioning framework for cost-efficiently achieving predictable DNN inference in the cloud. iGniter is comprised of two key components: (1) a lightweight DNN inference performance model, which leverages the system and workload metrics that are practically accessible to capture the performance interference; (2) A cost-efficient GPU resource provisioning strategy that jointly optimizes the GPU resource allocation and adaptive batching based on our inference performance model, with the aim of achieving predictable performance of DNN inference workloads. We implement a prototype of iGniter based on the NVIDIA Triton inference server hosted on EC2 GPU instances. Extensive prototype experiments on four representative DNN models and datasets demonstrate that iGniter can guarantee the performance SLOs of DNN inference workloads with practically acceptable runtime overhead, while saving the monetary cost by up to 25% in comparison to the state-of-the-art GPU resource provisioning strategies. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: 16 pages, 21 figures, submitted to IEEE Transactions on Parallel and Distributed Systems

arXiv:2112.03456 [pdf, other]

RSBNet: One-Shot Neural Architecture Search for A Backbone Network in Remote Sensing Image Recognition

Authors: Cheng Peng, Yangyang Li, Ronghua Shang, Licheng Jiao

Abstract: Recently, a massive number of deep learning based approaches have been successfully applied to various remote sensing image (RSI) recognition tasks. However, most existing advances of deep learning methods in the RSI field heavily rely on the features extracted by the manually designed backbone network, which severely hinders the potential of deep learning models due the complexity of RSI and the… ▽ More Recently, a massive number of deep learning based approaches have been successfully applied to various remote sensing image (RSI) recognition tasks. However, most existing advances of deep learning methods in the RSI field heavily rely on the features extracted by the manually designed backbone network, which severely hinders the potential of deep learning models due the complexity of RSI and the limitation of prior knowledge. In this paper, we research a new design paradigm for the backbone architecture in RSI recognition tasks, including scene classification, land-cover classification and object detection. A novel one-shot architecture search framework based on weight-sharing strategy and evolutionary algorithm is proposed, called RSBNet, which consists of three stages: Firstly, a supernet constructed in a layer-wise search space is pretrained on a self-assembled large-scale RSI dataset based on an ensemble single-path training strategy. Next, the pre-trained supernet is equipped with different recognition heads through the switchable recognition module and respectively fine-tuned on the target dataset to obtain task-specific supernet. Finally, we search the optimal backbone architecture for different recognition tasks based on the evolutionary algorithm without any network training. Extensive experiments have been conducted on five benchmark datasets for different recognition tasks, the results show the effectiveness of the proposed search paradigm and demonstrate that the searched backbone is able to flexibly adapt different RSI recognition tasks and achieve impressive performance. △ Less

Submitted 6 December, 2021; originally announced December 2021.

arXiv:2107.11678 [pdf]

Deep-learning-driven Reliable Single-pixel Imaging with Uncertainty Approximation

Authors: Ruibo Shang, Mikaela A. O'Brien, Geoffrey P. Luke

Abstract: Single-pixel imaging (SPI) has the advantages of high-speed acquisition over a broad wavelength range and system compactness, which are difficult to achieve by conventional imaging sensors. However, a common challenge is low image quality arising from undersampling. Deep learning (DL) is an emerging and powerful tool in computational imaging for many applications and researchers have applied DL in… ▽ More Single-pixel imaging (SPI) has the advantages of high-speed acquisition over a broad wavelength range and system compactness, which are difficult to achieve by conventional imaging sensors. However, a common challenge is low image quality arising from undersampling. Deep learning (DL) is an emerging and powerful tool in computational imaging for many applications and researchers have applied DL in SPI to achieve higher image quality than conventional reconstruction approaches. One outstanding challenge, however, is that the accuracy of DL predictions in SPI cannot be assessed in practical applications where the ground truths are unknown. Here, we propose the use of the Bayesian convolutional neural network (BCNN) to approximate the uncertainty (coming from finite training data and network model) of the DL predictions in SPI. Each pixel in the predicted result from BCNN represents the parameter of a probability distribution rather than the image intensity value. Then, the uncertainty can be approximated with BCNN by minimizing a negative log-likelihood loss function in the training stage and Monte Carlo dropout in the prediction stage. The results show that the BCNN can reliably approximate the uncertainty of the DL predictions in SPI with varying compression ratios and noise levels. The predicted uncertainty from BCNN in SPI reveals that most of the reconstruction errors in deep-learning-based SPI come from the edges of the image features. The results show that the proposed BCNN can provide a reliable tool to approximate the uncertainty of DL predictions in SPI and can be widely used in many applications of SPI. △ Less

Submitted 24 July, 2021; originally announced July 2021.

Comments: 19 pages, 4 figures

arXiv:2001.03493 [pdf]

doi 10.1364/OE.424165

A Two-step-training Deep Learning Framework for Real-time Computational Imaging without Physics Priors

Authors: Ruibo Shang, Kevin Hoffer-Hawlik, Geoffrey P. Luke

Abstract: Deep learning (DL) is a powerful tool in computational imaging for many applications. A common strategy is to reconstruct a preliminary image as the input of a neural network to achieve an optimized image. Usually, the preliminary image is acquired with the prior knowledge of the imaging model. One outstanding challenge, however, is the degree to which the actual imaging model deviates from the as… ▽ More Deep learning (DL) is a powerful tool in computational imaging for many applications. A common strategy is to reconstruct a preliminary image as the input of a neural network to achieve an optimized image. Usually, the preliminary image is acquired with the prior knowledge of the imaging model. One outstanding challenge, however, is the degree to which the actual imaging model deviates from the assumed model. Model mismatches degrade the quality of the preliminary image and therefore affect the DL predictions. Another main challenge is that since most imaging inverse problems are ill-posed and the networks are over-parameterized, DL networks have flexibility to extract features from the data that are not directly related to the imaging model. To solve these challenges, a two-step-training DL (TST-DL) framework is proposed for real-time computational imaging without physics priors. First, a single fully-connected layer (FCL) is trained to directly learn the model. Then, this FCL is fixed and concatenated with an un-trained U-Net architecture for a second-step training to improve the output image fidelity, resulting in four main advantages. First, it does not rely on an accurate representation of the imaging model since the model is directly learned. Second, real-time imaging can be achieved. Third, the TST-DL network is trained in the desired direction and the predictions are improved since the first step is constrained to learn the model and the second step improves the result by learning the optimal regularizer. Fourth, the approach accommodates any size and dimensionality of data. We demonstrate this framework using a linear single-pixel camera imaging model. The results are quantitatively compared with those from other DL frameworks and model-based iterative optimization approaches. We further extend this concept to nonlinear models in the application of image de-autocorrelation. △ Less

Submitted 18 August, 2020; v1 submitted 10 January, 2020; originally announced January 2020.

Comments: 16 pages, 12 figures

Showing 1–12 of 12 results for author: Shang, R