Search | arXiv e-print repository

Targeted Mining Precise-positioning Episode Rules

Authors: Jian Zhu, Xiaoye Chen, Wensheng Gan, Zefeng Chen, Philip S. Yu

Abstract: The era characterized by an exponential increase in data has led to the widespread adoption of data intelligence as a crucial task. Within the field of data mining, frequent episode mining has emerged as an effective tool for extracting valuable and essential information from event sequences. Various algorithms have been developed to discover frequent episodes and subsequently derive episode rules… ▽ More The era characterized by an exponential increase in data has led to the widespread adoption of data intelligence as a crucial task. Within the field of data mining, frequent episode mining has emerged as an effective tool for extracting valuable and essential information from event sequences. Various algorithms have been developed to discover frequent episodes and subsequently derive episode rules using the frequency function and anti-monotonicity principles. However, currently, there is a lack of algorithms specifically designed for mining episode rules that encompass user-specified query episodes. To address this challenge and enable the mining of target episode rules, we introduce the definition of targeted precise-positioning episode rules and formulate the problem of targeted mining precise-positioning episode rules. Most importantly, we develop an algorithm called Targeted Mining Precision Episode Rules (TaMIPER) to address the problem and optimize it using four proposed strategies, leading to significant reductions in both time and space resource requirements. As a result, TaMIPER offers high accuracy and efficiency in mining episode rules of user interest and holds promising potential for prediction tasks in various domains, such as weather observation, network intrusion, and e-commerce. Experimental results on six real datasets demonstrate the exceptional performance of TaMIPER. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: IEEE TETCI, 14 pages

arXiv:2405.13055 [pdf, other]

Large Language Models for Medicine: A Survey

Authors: Yanxin Zheng, Wensheng Gan, Zefeng Chen, Zhenlian Qi, Qian Liang, Philip S. Yu

Abstract: To address challenges in the digital economy's landscape of digital intelligence, large language models (LLMs) have been developed. Improvements in computational power and available resources have significantly advanced LLMs, allowing their integration into diverse domains for human life. Medical LLMs are essential application tools with potential across various medical scenarios. In this paper, w… ▽ More To address challenges in the digital economy's landscape of digital intelligence, large language models (LLMs) have been developed. Improvements in computational power and available resources have significantly advanced LLMs, allowing their integration into diverse domains for human life. Medical LLMs are essential application tools with potential across various medical scenarios. In this paper, we review LLM developments, focusing on the requirements and applications of medical LLMs. We provide a concise overview of existing models, aiming to explore advanced research directions and benefit researchers for future medical applications. We emphasize the advantages of medical LLMs in applications, as well as the challenges encountered during their development. Finally, we suggest directions for technical integration to mitigate challenges and potential research directions for the future of medical LLMs, aiming to meet the demands of the medical field better. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: Preprint. 5 figures,5 tables

arXiv:2405.13001 [pdf, other]

Large Language Models for Education: A Survey

Authors: Hanyi Xu, Wensheng Gan, Zhenlian Qi, Jiayang Wu, Philip S. Yu

Abstract: Artificial intelligence (AI) has a profound impact on traditional education. In recent years, large language models (LLMs) have been increasingly used in various applications such as natural language processing, computer vision, speech recognition, and autonomous driving. LLMs have also been applied in many fields, including recommendation, finance, government, education, legal affairs, and financ… ▽ More Artificial intelligence (AI) has a profound impact on traditional education. In recent years, large language models (LLMs) have been increasingly used in various applications such as natural language processing, computer vision, speech recognition, and autonomous driving. LLMs have also been applied in many fields, including recommendation, finance, government, education, legal affairs, and finance. As powerful auxiliary tools, LLMs incorporate various technologies such as deep learning, pre-training, fine-tuning, and reinforcement learning. The use of LLMs for smart education (LLMEdu) has been a significant strategic direction for countries worldwide. While LLMs have shown great promise in improving teaching quality, changing education models, and modifying teacher roles, the technologies are still facing several challenges. In this paper, we conduct a systematic review of LLMEdu, focusing on current technologies, challenges, and future developments. We first summarize the current state of LLMEdu and then introduce the characteristics of LLMs and education, as well as the benefits of integrating LLMs into education. We also review the process of integrating LLMs into the education industry, as well as the introduction of related technologies. Finally, we discuss the challenges and problems faced by LLMEdu, as well as prospects for future optimization of LLMEdu. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: Journal of Machine Learning and Cybernetics. 4 tables, 6 figures

arXiv:2405.12496 [pdf, other]

A Survey of Integrating Wireless Technology into Active Noise Control

Authors: Xiaoyi Shen, Dongyuan Shi, Zhengding Luo, Junwei Ji, Woon-Seng Gan

Abstract: Active Noise Control (ANC) is a widely adopted technology for reducing environmental noise across various scenarios. This paper focuses on enhancing noise reduction performance, particularly through the refinement of signal quality fed into ANC systems. We discuss the main wireless technique integrated into the ANC system, equipped with some innovative algorithms, in diverse environments. Instead… ▽ More Active Noise Control (ANC) is a widely adopted technology for reducing environmental noise across various scenarios. This paper focuses on enhancing noise reduction performance, particularly through the refinement of signal quality fed into ANC systems. We discuss the main wireless technique integrated into the ANC system, equipped with some innovative algorithms, in diverse environments. Instead of using microphone arrays, which increase the computation complexity of the ANC system, to isolate multiple noise sources to improve noise reduction performance, the application of the wireless technique avoids extra computation demand. Wireless transmissions of reference, error, and control signals are also applied to improve the convergence performance of the ANC system. Furthermore, this paper lists some wireless ANC applications, such as earbuds, headphones, windows, and headrests, underscoring their adaptability and efficiency in various settings. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.07536 [pdf, other]

Multi-AUV Kinematic Task Assignment based on Self-organizing Map Neural Network and Dubins Path Generator

Authors: Xin Li, Wenyang Gan, Pang Wen, Daqi Zhu

Abstract: To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network meth… ▽ More To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network method based on workload balance and neighborhood function. When there exists kinematic constraints or obstacles which may cause failure of trajectory planning, task re-assignment will be implemented by change the weights of SOM neurals, until the AUVs can have paths to reach all the targets. Then, the Dubins paths are generated in several limited cases. AUV's yaw angle is limited, which result in new assignments to the targets. Computation flow is designed so that the algorithm in MATLAB and Python can realizes the path planning to multiple targets. Finally, simulation results prove that the proposed algorithm can effectively accomplish the task assignment task for multi-AUV system. △ Less

Submitted 24 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2404.18428 [pdf, other]

Geospatial Big Data: Survey and Challenges

Authors: Jiayang Wu, Wensheng Gan, Han-Chieh Chao, Philip S. Yu

Abstract: In recent years, geospatial big data (GBD) has obtained attention across various disciplines, categorized into big earth observation data and big human behavior data. Identifying geospatial patterns from GBD has been a vital research focus in the fields of urban management and environmental sustainability. This paper reviews the evolution of GBD mining and its integration with advanced artificial… ▽ More In recent years, geospatial big data (GBD) has obtained attention across various disciplines, categorized into big earth observation data and big human behavior data. Identifying geospatial patterns from GBD has been a vital research focus in the fields of urban management and environmental sustainability. This paper reviews the evolution of GBD mining and its integration with advanced artificial intelligence (AI) techniques. GBD consists of data generated by satellites, sensors, mobile devices, and geographical information systems, and we categorize geospatial data based on different perspectives. We outline the process of GBD mining and demonstrate how it can be incorporated into a unified framework. Additionally, we explore new technologies like large language models (LLM), the Metaverse, and knowledge graphs, and how they could make GBD even more useful. We also share examples of GBD hel** with city management and protecting the environment. Finally, we discuss the real challenges that come up when working with GBD, such as issues with data retrieval and security. Our goal is to give readers a clear view of where GBD mining stands today and where it might go next. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: IEEE JSTARS. 14 pages, 5 figures

arXiv:2403.18139 [pdf, other]

Pseudo-MRI-Guided PET Image Reconstruction Method Based on a Diffusion Probabilistic Model

Authors: Weijie Gan, Huidong Xie, Carl von Gall, Günther Platsch, Michael T. Jurkiewicz, Andrea Andrade, Udunna C. Anazodo, Ulugbek S. Kamilov, Hongyu An, Jorge Cabello

Abstract: Anatomically guided PET reconstruction using MRI information has been shown to have the potential to improve PET image quality. However, these improvements are limited to PET scans with paired MRI information. In this work we employed a diffusion probabilistic model (DPM) to infer T1-weighted-MRI (deep-MRI) images from FDG-PET brain images. We then use the DPM-generated T1w-MRI to guide the PET re… ▽ More Anatomically guided PET reconstruction using MRI information has been shown to have the potential to improve PET image quality. However, these improvements are limited to PET scans with paired MRI information. In this work we employed a diffusion probabilistic model (DPM) to infer T1-weighted-MRI (deep-MRI) images from FDG-PET brain images. We then use the DPM-generated T1w-MRI to guide the PET reconstruction. The model was trained with brain FDG scans, and tested in datasets containing multiple levels of counts. Deep-MRI images appeared somewhat degraded than the acquired MRI images. Regarding PET image quality, volume of interest analysis in different brain regions showed that both PET reconstructed images using the acquired and the deep-MRI images improved image quality compared to OSEM. Same conclusions were found analysing the decimated datasets. A subjective evaluation performed by two physicians confirmed that OSEM scored consistently worse than the MRI-guided PET images and no significant differences were observed between the MRI-guided PET images. This proof of concept shows that it is possible to infer DPM-based MRI imagery to guide the PET reconstruction, enabling the possibility of changing reconstruction parameters such as the strength of the prior on anatomically guided PET reconstruction in the absence of MRI. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2402.09460 [pdf, other]

doi 10.1109/ICASSP48485.2024.10448277

Unsupervised learning based end-to-end delayless generative fixed-filter active noise control

Authors: Zhengding Luo, Dongyuan Shi, Xiaoyi Shen, Woon-Seng Gan

Abstract: Delayless noise control is achieved by our earlier generative fixed-filter active noise control (GFANC) framework through efficient coordination between the co-processor and real-time controller. However, the one-dimensional convolutional neural network (1D CNN) in the co-processor requires initial training using labelled noise datasets. Labelling noise data can be resource-intensive and may intro… ▽ More Delayless noise control is achieved by our earlier generative fixed-filter active noise control (GFANC) framework through efficient coordination between the co-processor and real-time controller. However, the one-dimensional convolutional neural network (1D CNN) in the co-processor requires initial training using labelled noise datasets. Labelling noise data can be resource-intensive and may introduce some biases. In this paper, we propose an unsupervised-GFANC approach to simplify the 1D CNN training process and enhance its practicality. During training, the co-processor and real-time controller are integrated into an end-to-end differentiable ANC system. This enables us to use the accumulated squared error signal as the loss for training the 1D CNN. With this unsupervised learning paradigm, the unsupervised-GFANC method not only omits the labelling process but also exhibits better noise reduction performance compared to the supervised GFANC method in real noise experiments. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

arXiv:2402.02694 [pdf, other]

Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

Authors: Jisheng Bai, Mou Wang, Haohe Liu, Han Yin, Yafei Jia, Siwei Huang, Yutong Du, Dongzhe Zhang, Dongyuan Shi, Woon-Seng Gan, Mark D. Plumbley, Susanto Rahardja, Bin Xiang, Jianfeng Chen

Abstract: Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Althoug… ▽ More Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Although this task, in recent years, has achieved substantial progress in device generalization, the challenge of domain shift between different geographical regions, involving discrepancies such as time, space, culture, and language, remains insufficiently explored at present. In addition, considering the abundance of unlabeled acoustic scene data in the real world, it is important to study the possible ways to utilize these unlabelled data. Therefore, we introduce the task Semi-supervised Acoustic Scene Classification under Domain Shift in the ICME 2024 Grand Challenge. We encourage participants to innovate with semi-supervised learning techniques, aiming to develop more robust ASC models under domain shift. △ Less

Submitted 28 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2401.13998 [pdf, other]

WAL-Net: Weakly supervised auxiliary task learning network for carotid plaques classification

Authors: Haitao Gan, Lingchao Fu, Ran Zhou, Weiyan Gan, Furong Wang, Xiaoyan Wu, Zhi Yang, Zhongwei Huang

Abstract: The classification of carotid artery ultrasound images is a crucial means for diagnosing carotid plaques, holding significant clinical relevance for predicting the risk of stroke. Recent research suggests that utilizing plaque segmentation as an auxiliary task for classification can enhance performance by leveraging the correlation between segmentation and classification tasks. However, this appro… ▽ More The classification of carotid artery ultrasound images is a crucial means for diagnosing carotid plaques, holding significant clinical relevance for predicting the risk of stroke. Recent research suggests that utilizing plaque segmentation as an auxiliary task for classification can enhance performance by leveraging the correlation between segmentation and classification tasks. However, this approach relies on obtaining a substantial amount of challenging-to-acquire segmentation annotations. This paper proposes a novel weakly supervised auxiliary task learning network model (WAL-Net) to explore the interdependence between carotid plaque classification and segmentation tasks. The plaque classification task is primary task, while the plaque segmentation task serves as an auxiliary task, providing valuable information to enhance the performance of the primary task. Weakly supervised learning is adopted in the auxiliary task to completely break away from the dependence on segmentation annotations. Experiments and evaluations are conducted on a dataset comprising 1270 carotid plaque ultrasound images from Wuhan University Zhongnan Hospital. Results indicate that the proposed method achieved an approximately 1.3% improvement in carotid plaque classification accuracy compared to the baseline network. Specifically, the accuracy of mixed-echoic plaques classification increased by approximately 3.3%, demonstrating the effectiveness of our approach. △ Less

Submitted 27 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.08678 [pdf, other]

Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

Authors: Han Yin, Mou Wang, Jisheng Bai, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

Abstract: This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines. This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: Submitted to ICASSP 2024

arXiv:2401.01599 [pdf, other]

Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

Authors: Yicheng Li, Weiye Gan, Zuoqiang Shi, Qian Lin

Abstract: The generalization error curve of certain kernel regression method aims at determining the exact order of generalization error with various source condition, noise level and choice of the regularization parameter rather than the minimax rate. In this work, under mild assumptions, we rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method… ▽ More The generalization error curve of certain kernel regression method aims at determining the exact order of generalization error with various source condition, noise level and choice of the regularization parameter rather than the minimax rate. In this work, under mild assumptions, we rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method (and a large class of analytic spectral algorithms) in kernel regression. Consequently, we could sharpen the near inconsistency of kernel interpolation and clarify the saturation effects of kernel regression algorithms with higher qualification, etc. Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks. A novel technical contribution, the analytic functional argument, might be of independent interest. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.10073 [pdf, other]

Data Scarcity in Recommendation Systems: A Survey

Authors: Zefeng Chen, Wensheng Gan, Jiayang Wu, Kaixia Hu, Hong Lin

Abstract: The prevalence of online content has led to the widespread adoption of recommendation systems (RSs), which serve diverse purposes such as news, advertisements, and e-commerce recommendations. Despite their significance, data scarcity issues have significantly impaired the effectiveness of existing RS models and hindered their progress. To address this challenge, the concept of knowledge transfer,… ▽ More The prevalence of online content has led to the widespread adoption of recommendation systems (RSs), which serve diverse purposes such as news, advertisements, and e-commerce recommendations. Despite their significance, data scarcity issues have significantly impaired the effectiveness of existing RS models and hindered their progress. To address this challenge, the concept of knowledge transfer, particularly from external sources like pre-trained language models, emerges as a potential solution to alleviate data scarcity and enhance RS development. However, the practice of knowledge transfer in RSs is intricate. Transferring knowledge between domains introduces data disparities, and the application of knowledge transfer in complex RS scenarios can yield negative consequences if not carefully designed. Therefore, this article contributes to this discourse by addressing the implications of data scarcity on RSs and introducing various strategies, such as data augmentation, self-supervised learning, transfer learning, broad learning, and knowledge graph utilization, to mitigate this challenge. Furthermore, it delves into the challenges and future direction within the RS domain, offering insights that are poised to facilitate the development and implementation of robust RSs, particularly when confronted with data scarcity. We aim to provide valuable guidance and inspiration for researchers and practitioners, ultimately driving advancements in the field of RS. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: ACM Transactions on Recommender Systems, 32 pages

arXiv:2312.03718 [pdf, other]

Large Language Models in Law: A Survey

Authors: **qi Lai, Wensheng Gan, Jiayang Wu, Zhenlian Qi, Philip S. Yu

Abstract: The advent of artificial intelligence (AI) has significantly impacted the traditional judicial industry. Moreover, recently, with the development of AI-generated content (AIGC), AI and law have found applications in various domains, including image recognition, automatic text generation, and interactive chat. With the rapid emergence and growing popularity of large models, it is evident that AI wi… ▽ More The advent of artificial intelligence (AI) has significantly impacted the traditional judicial industry. Moreover, recently, with the development of AI-generated content (AIGC), AI and law have found applications in various domains, including image recognition, automatic text generation, and interactive chat. With the rapid emergence and growing popularity of large models, it is evident that AI will drive transformation in the traditional judicial industry. However, the application of legal large language models (LLMs) is still in its nascent stage. Several challenges need to be addressed. In this paper, we aim to provide a comprehensive survey of legal LLMs. We not only conduct an extensive survey of LLMs, but also expose their applications in the judicial system. We first provide an overview of AI technologies in the legal field and showcase the recent research in LLMs. Then, we discuss the practical implementation presented by legal LLMs, such as providing legal advice to users and assisting judges during trials. In addition, we explore the limitations of legal LLMs, including data, algorithms, and judicial practice. Finally, we summarize practical recommendations and propose future development directions to address these challenges. △ Less

Submitted 25 November, 2023; originally announced December 2023.

Comments: Preprint

arXiv:2311.18810 [pdf, other]

Convergence of Nonconvex PnP-ADMM with MMSE Denoisers

Authors: Chicago Park, Shirin Shoushtari, Weijie Gan, Ulugbek S. Kamilov

Abstract: Plug-and-Play Alternating Direction Method of Multipliers (PnP-ADMM) is a widely-used algorithm for solving inverse problems by integrating physical measurement models and convolutional neural network (CNN) priors. PnP-ADMM has been theoretically proven to converge for convex data-fidelity terms and nonexpansive CNNs. It has however been observed that PnP-ADMM often empirically converges even for… ▽ More Plug-and-Play Alternating Direction Method of Multipliers (PnP-ADMM) is a widely-used algorithm for solving inverse problems by integrating physical measurement models and convolutional neural network (CNN) priors. PnP-ADMM has been theoretically proven to converge for convex data-fidelity terms and nonexpansive CNNs. It has however been observed that PnP-ADMM often empirically converges even for expansive CNNs. This paper presents a theoretical explanation for the observed stability of PnP-ADMM based on the interpretation of the CNN prior as a minimum mean-squared error (MMSE) denoiser. Our explanation parallels a similar argument recently made for the iterative shrinkage/thresholding algorithm variant of PnP (PnP-ISTA) and relies on the connection between MMSE denoisers and proximal operators. We also numerically evaluate the performance gap between PnP-ADMM using a nonexpansive DnCNN denoiser and expansive DRUNet denoiser, thus motivating the use of expansive CNNs. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.15445 [pdf, other]

FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration

Authors: Zihao Zou, Jiaming Liu, Shirin Shoushtari, Yubo Wang, Weijie Gan, Ulugbek S. Kamilov

Abstract: Face video restoration (FVR) is a challenging but important problem where one seeks to recover a perceptually realistic face videos from a low-quality input. While diffusion probabilistic models (DPMs) have been shown to achieve remarkable performance for face image restoration, they often fail to preserve temporally coherent, high-quality videos, compromising the fidelity of reconstructed faces.… ▽ More Face video restoration (FVR) is a challenging but important problem where one seeks to recover a perceptually realistic face videos from a low-quality input. While diffusion probabilistic models (DPMs) have been shown to achieve remarkable performance for face image restoration, they often fail to preserve temporally coherent, high-quality videos, compromising the fidelity of reconstructed faces. We present a new conditional diffusion framework called FLAIR for FVR. FLAIR ensures temporal consistency across frames in a computationally efficient fashion by converting a traditional image DPM into a video DPM. The proposed conversion uses a recurrent video refinement layer and a temporal self-attention at different scales. FLAIR also uses a conditional iterative refinement process to balance the perceptual and distortion quality during inference. This process consists of two key components: a data-consistency module that analytically ensures that the generated video precisely matches its degraded observation and a coarse-to-fine image enhancement module specifically for facial regions. Our extensive experiments show superiority of FLAIR over the current state-of-the-art (SOTA) for video super-resolution, deblurring, JPEG restoration, and space-time frame interpolation on two high-quality face video datasets. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 32 pages, 27 figures

arXiv:2311.13165 [pdf, other]

Multimodal Large Language Models: A Survey

Authors: Jiayang Wu, Wensheng Gan, Zefeng Chen, Shicheng Wan, Philip S. Yu

Abstract: The exploration of multimodal language models integrates multiple data types, such as images, text, language, audio, and other heterogeneity. While the latest large language models excel in text-based tasks, they often struggle to understand and process other data types. Multimodal models address this limitation by combining various modalities, enabling a more comprehensive understanding of divers… ▽ More The exploration of multimodal language models integrates multiple data types, such as images, text, language, audio, and other heterogeneity. While the latest large language models excel in text-based tasks, they often struggle to understand and process other data types. Multimodal models address this limitation by combining various modalities, enabling a more comprehensive understanding of diverse data. This paper begins by defining the concept of multimodal and examining the historical development of multimodal algorithms. Furthermore, we introduce a range of multimodal products, focusing on the efforts of major technology companies. A practical guide is provided, offering insights into the technical aspects of multimodal models. Moreover, we present a compilation of the latest algorithms and commonly used datasets, providing researchers with valuable resources for experimentation and evaluation. Lastly, we explore the applications of multimodal models and discuss the challenges associated with their development. By addressing these aspects, this paper aims to facilitate a deeper understanding of multimodal models and their potential in various domains. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: IEEE BigData 2023. 10 pages

arXiv:2311.13160 [pdf, other]

Large Language Models in Education: Vision and Opportunities

Authors: Wensheng Gan, Zhenlian Qi, Jiayang Wu, Jerry Chun-Wei Lin

Abstract: With the rapid development of artificial intelligence technology, large language models (LLMs) have become a hot research topic. Education plays an important role in human social development and progress. Traditional education faces challenges such as individual student differences, insufficient allocation of teaching resources, and assessment of teaching effectiveness. Therefore, the applications… ▽ More With the rapid development of artificial intelligence technology, large language models (LLMs) have become a hot research topic. Education plays an important role in human social development and progress. Traditional education faces challenges such as individual student differences, insufficient allocation of teaching resources, and assessment of teaching effectiveness. Therefore, the applications of LLMs in the field of digital/smart education have broad prospects. The research on educational large models (EduLLMs) is constantly evolving, providing new methods and approaches to achieve personalized learning, intelligent tutoring, and educational assessment goals, thereby improving the quality of education and the learning experience. This article aims to investigate and summarize the application of LLMs in smart education. It first introduces the research background and motivation of LLMs and explains the essence of LLMs. It then discusses the relationship between digital education and EduLLMs and summarizes the current research status of educational large models. The main contributions are the systematic summary and vision of the research background, motivation, and application of large models for education (LLM4Edu). By reviewing existing research, this article provides guidance and insights for educators, researchers, and policy-makers to gain a deep understanding of the potential and challenges of LLM4Edu. It further provides guidance for further advancing the development and application of LLM4Edu, while still facing technical, ethical, and practical challenges requiring further research and exploration. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: IEEE BigData 2023. 10 pages

arXiv:2311.10945 [pdf, other]

An Empirical Bayes Framework for Open-Domain Dialogue Generation

Authors: **g Yang Lee, Kong Aik Lee, Woon-Seng Gan

Abstract: To engage human users in meaningful conversation, open-domain dialogue agents are required to generate diverse and contextually coherent dialogue. Despite recent advancements, which can be attributed to the usage of pretrained language models, the generation of diverse and coherent dialogue remains an open research problem. A popular approach to address this issue involves the adaptation of variat… ▽ More To engage human users in meaningful conversation, open-domain dialogue agents are required to generate diverse and contextually coherent dialogue. Despite recent advancements, which can be attributed to the usage of pretrained language models, the generation of diverse and coherent dialogue remains an open research problem. A popular approach to address this issue involves the adaptation of variational frameworks. However, while these approaches successfully improve diversity, they tend to compromise on contextual coherence. Hence, we propose the Bayesian Open-domain Dialogue with Empirical Bayes (BODEB) framework, an empirical bayes framework for constructing an Bayesian open-domain dialogue agent by leveraging pretrained parameters to inform the prior and posterior parameter distributions. Empirical results show that BODEB achieves better results in terms of both diversity and coherence compared to variational frameworks. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.10943 [pdf, other]

Partially Randomizing Transformer Weights for Dialogue Response Diversity

Authors: **g Yang Lee, Kong Aik Lee, Woon-Seng Gan

Abstract: Despite recent progress in generative open-domain dialogue, the issue of low response diversity persists. Prior works have addressed this issue via either novel objective functions, alternative learning approaches such as variational frameworks, or architectural extensions such as the Randomized Link (RL) Transformer. However, these approaches typically entail either additional difficulties during… ▽ More Despite recent progress in generative open-domain dialogue, the issue of low response diversity persists. Prior works have addressed this issue via either novel objective functions, alternative learning approaches such as variational frameworks, or architectural extensions such as the Randomized Link (RL) Transformer. However, these approaches typically entail either additional difficulties during training/inference, or a significant increase in model size and complexity. Hence, we propose the \underline{Pa}rtially \underline{Ra}ndomized trans\underline{Former} (PaRaFormer), a simple extension of the transformer which involves freezing the weights of selected layers after random initialization. Experimental results reveal that the performance of the PaRaformer is comparable to that of the aforementioned approaches, despite not entailing any additional training difficulty or increase in model complexity. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.07226 [pdf, other]

Large Language Models for Robotics: A Survey

Authors: Fanlong Zeng, Wensheng Gan, Yongheng Wang, Ning Liu, Philip S. Yu

Abstract: The human ability to learn, generalize, and control complex manipulation tasks through multi-modality feedback suggests a unique capability, which we refer to as dexterity intelligence. Understanding and assessing this intelligence is a complex task. Amidst the swift progress and extensive proliferation of large language models (LLMs), their applications in the field of robotics have garnered incr… ▽ More The human ability to learn, generalize, and control complex manipulation tasks through multi-modality feedback suggests a unique capability, which we refer to as dexterity intelligence. Understanding and assessing this intelligence is a complex task. Amidst the swift progress and extensive proliferation of large language models (LLMs), their applications in the field of robotics have garnered increasing attention. LLMs possess the ability to process and generate natural language, facilitating efficient interaction and collaboration with robots. Researchers and engineers in the field of robotics have recognized the immense potential of LLMs in enhancing robot intelligence, human-robot interaction, and autonomy. Therefore, this comprehensive review aims to summarize the applications of LLMs in robotics, delving into their impact and contributions to key areas such as robot control, perception, decision-making, and path planning. We first provide an overview of the background and development of LLMs for robotics, followed by a description of the benefits of LLMs for robotics and recent advancements in robotics models based on LLMs. We then delve into the various techniques used in the model, including those employed in perception, decision-making, control, and interaction. Finally, we explore the applications of LLMs in robotics and some potential challenges they may face in the near future. Embodied intelligence is the future of intelligent science, and LLMs-based robotics is one of the promising but challenging paths to achieve this. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: Preprint. 4 figures, 3 tables

arXiv:2311.05804 [pdf, other]

Model-as-a-Service (MaaS): A Survey

Authors: Wensheng Gan, Shicheng Wan, Philip S. Yu

Abstract: Due to the increased number of parameters and data in the pre-trained model exceeding a certain level, a foundation model (e.g., a large language model) can significantly improve downstream task performance and emerge with some novel special abilities (e.g., deep learning, complex reasoning, and human alignment) that were not present before. Foundation models are a form of generative artificial in… ▽ More Due to the increased number of parameters and data in the pre-trained model exceeding a certain level, a foundation model (e.g., a large language model) can significantly improve downstream task performance and emerge with some novel special abilities (e.g., deep learning, complex reasoning, and human alignment) that were not present before. Foundation models are a form of generative artificial intelligence (GenAI), and Model-as-a-Service (MaaS) has emerged as a groundbreaking paradigm that revolutionizes the deployment and utilization of GenAI models. MaaS represents a paradigm shift in how we use AI technologies and provides a scalable and accessible solution for developers and users to leverage pre-trained AI models without the need for extensive infrastructure or expertise in model training. In this paper, the introduction aims to provide a comprehensive overview of MaaS, its significance, and its implications for various industries. We provide a brief review of the development history of "X-as-a-Service" based on cloud computing and present the key technologies involved in MaaS. The development of GenAI models will become more democratized and flourish. We also review recent application studies of MaaS. Finally, we highlight several challenges and future issues in this promising area. MaaS is a new deployment and service paradigm for different AI-based models. We hope this review will inspire future research in the field of MaaS. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: Preprint. 3 figures, 1 tables

arXiv:2311.02121 [pdf, other]

Enhancing Monocular Height Estimation from Aerial Images with Street-view Images

Authors: Xiaomou Hou, Wanshui Gan, Naoto Yokoya

Abstract: Accurate height estimation from monocular aerial imagery presents a significant challenge due to its inherently ill-posed nature. This limitation is rooted in the absence of adequate geometric constraints available to the model when training with monocular imagery. Without additional geometric information to supplement the monocular image data, the model's ability to provide reliable estimations i… ▽ More Accurate height estimation from monocular aerial imagery presents a significant challenge due to its inherently ill-posed nature. This limitation is rooted in the absence of adequate geometric constraints available to the model when training with monocular imagery. Without additional geometric information to supplement the monocular image data, the model's ability to provide reliable estimations is compromised. In this paper, we propose a method that enhances monocular height estimation by incorporating street-view images. Our insight is that street-view images provide a distinct viewing perspective and rich structural details of the scene, serving as geometric constraints to enhance the performance of monocular height estimation. Specifically, we aim to optimize an implicit 3D scene representation, density field, with geometry constraints from street-view images, thereby improving the accuracy and robustness of height estimation. Our experimental results demonstrate the effectiveness of our proposed method, outperforming the baseline and offering significant improvements in terms of accuracy and structural consistency. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2311.02003 [pdf, other]

A Structured Pruning Algorithm for Model-based Deep Learning

Authors: Chicago Park, Weijie Gan, Zihao Zou, Yuyang Hu, Zhixin Sun, Ulugbek S. Kamilov

Abstract: There is a growing interest in model-based deep learning (MBDL) for solving imaging inverse problems. MBDL networks can be seen as iterative algorithms that estimate the desired image using a physical measurement model and a learned image prior specified using a convolutional neural net (CNNs). The iterative nature of MBDL networks increases the test-time computational complexity, which limits the… ▽ More There is a growing interest in model-based deep learning (MBDL) for solving imaging inverse problems. MBDL networks can be seen as iterative algorithms that estimate the desired image using a physical measurement model and a learned image prior specified using a convolutional neural net (CNNs). The iterative nature of MBDL networks increases the test-time computational complexity, which limits their applicability in certain large-scale applications. We address this issue by presenting structured pruning algorithm for model-based deep learning (SPADE) as the first structured pruning algorithm for MBDL networks. SPADE reduces the computational complexity of CNNs used within MBDL networks by pruning its non-essential weights. We propose three distinct strategies to fine-tune the pruned MBDL networks to minimize the performance loss. Each fine-tuning strategy has a unique benefit that depends on the presence of a pre-trained model and a high-quality ground truth. We validate SPADE on two distinct inverse problems, namely compressed sensing MRI and image super-resolution. Our results highlight that MBDL models pruned by SPADE can achieve substantial speed up in testing time while maintaining competitive performance. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2311.00230 [pdf, other]

DINO-Mix: Enhancing Visual Place Recognition with Foundational Vision Model and Feature Mixing

Authors: Gaoshuang Huang, Yang Zhou, Xiaofei Hu, Chenglong Zhang, Luying Zhao, Wenjian Gan, Mingbo Hou

Abstract: Utilizing visual place recognition (VPR) technology to ascertain the geographical location of publicly available images is a pressing issue for real-world VPR applications. Although most current VPR methods achieve favorable results under ideal conditions, their performance in complex environments, characterized by lighting variations, seasonal changes, and occlusions caused by moving objects, is… ▽ More Utilizing visual place recognition (VPR) technology to ascertain the geographical location of publicly available images is a pressing issue for real-world VPR applications. Although most current VPR methods achieve favorable results under ideal conditions, their performance in complex environments, characterized by lighting variations, seasonal changes, and occlusions caused by moving objects, is generally unsatisfactory. In this study, we utilize the DINOv2 model as the backbone network for trimming and fine-tuning to extract robust image features. We propose a novel VPR architecture called DINO-Mix, which combines a foundational vision model with feature aggregation. This architecture relies on the powerful image feature extraction capabilities of foundational vision models. We employ an MLP-Mixer-based mix module to aggregate image features, resulting in globally robust and generalizable descriptors that enable high-precision VPR. We experimentally demonstrate that the proposed DINO-Mix architecture significantly outperforms current state-of-the-art (SOTA) methods. In test sets having lighting variations, seasonal changes, and occlusions (Tokyo24/7, Nordland, SF-XL-Testv1), our proposed DINO-Mix architecture achieved Top-1 accuracy rates of 91.75%, 80.18%, and 82%, respectively. Compared with SOTA methods, our architecture exhibited an average accuracy improvement of 5.14%. △ Less

Submitted 5 December, 2023; v1 submitted 31 October, 2023; originally announced November 2023.

Comments: Under review / Open source code

arXiv:2310.13699 [pdf, other]

Interaction in Metaverse: A Survey

Authors: Hong Lin, Zirun Gan, Wensheng Gan, Zhenlian Qi, Yuehua Wang, Philip S. Yu

Abstract: Human-computer interaction (HCI) emerged with the birth of the computer and has been upgraded through decades of development. Metaverse has attracted a lot of interest with its immersive experience, and HCI is the entrance to the Metaverse for people. It is predictable that HCI will determine the immersion of the Metaverse. However, the technologies of HCI in Metaverse are not mature enough. There… ▽ More Human-computer interaction (HCI) emerged with the birth of the computer and has been upgraded through decades of development. Metaverse has attracted a lot of interest with its immersive experience, and HCI is the entrance to the Metaverse for people. It is predictable that HCI will determine the immersion of the Metaverse. However, the technologies of HCI in Metaverse are not mature enough. There are many issues that we should address for HCI in the Metaverse. To this end, the purpose of this paper is to provide a systematic literature review on the key technologies and applications of HCI in the Metaverse. This paper is a comprehensive survey of HCI for the Metaverse, focusing on current technology, future directions, and challenges. First, we provide a brief overview of HCI in the Metaverse and their mutually exclusive relationships. Then, we summarize the evolution of HCI and its future characteristics in the Metaverse. Next, we envision and present the key technologies involved in HCI in the Metaverse. We also review recent case studies of HCI in the Metaverse. Finally, we highlight several challenges and future issues in this promising area. △ Less

Submitted 27 September, 2023; originally announced October 2023.

Comments: Preprint. 3 figures, 3 tables

arXiv:2310.07504 [pdf, other]

PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction

Authors: Weijie Gan, Qiuchen Zhai, Michael Thompson McCann, Cristina Garcia Cardona, Ulugbek S. Kamilov, Brendt Wohlberg

Abstract: Ptychography is an imaging technique that captures multiple overlap** snapshots of a sample, illuminated coherently by a moving localized probe. The image recovery from ptychographic data is generally achieved via an iterative algorithm that solves a nonlinear phase retrieval problem derived from measured diffraction patterns. However, these iterative approaches have high computational cost. In… ▽ More Ptychography is an imaging technique that captures multiple overlap** snapshots of a sample, illuminated coherently by a moving localized probe. The image recovery from ptychographic data is generally achieved via an iterative algorithm that solves a nonlinear phase retrieval problem derived from measured diffraction patterns. However, these iterative approaches have high computational cost. In this paper, we introduce PtychoDV, a novel deep model-based network designed for efficient, high-quality ptychographic image reconstruction. PtychoDV comprises a vision transformer that generates an initial image from the set of raw measurements, taking into consideration their mutual correlations. This is followed by a deep unrolling network that refines the initial image using learnable convolutional priors and the ptychography measurement model. Experimental results on simulated data demonstrate that PtychoDV is capable of outperforming existing deep learning methods for this problem, and significantly reduces computational cost compared to iterative methodologies, while maintaining competitive performance. △ Less

Submitted 6 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2309.16102 [pdf, other]

Discovering Utility-driven Interval Rules

Authors: Chunkai Zhang, Maohua Lyu, Huai** Hao, Wensheng Gan, Philip S. Yu

Abstract: For artificial intelligence, high-utility sequential rule mining (HUSRM) is a knowledge discovery method that can reveal the associations between events in the sequences. Recently, abundant methods have been proposed to discover high-utility sequence rules. However, the existing methods are all related to point-based sequences. Interval events that persist for some time are common. Traditional int… ▽ More For artificial intelligence, high-utility sequential rule mining (HUSRM) is a knowledge discovery method that can reveal the associations between events in the sequences. Recently, abundant methods have been proposed to discover high-utility sequence rules. However, the existing methods are all related to point-based sequences. Interval events that persist for some time are common. Traditional interval-event sequence knowledge discovery tasks mainly focus on pattern discovery, but patterns cannot reveal the correlation between interval events well. Moreover, the existing HUSRM algorithms cannot be directly applied to interval-event sequences since the relation in interval-event sequences is much more intricate than those in point-based sequences. In this work, we propose a utility-driven interval rule mining (UIRMiner) algorithm that can extract all utility-driven interval rules (UIRs) from the interval-event sequence database to solve the problem. In UIRMiner, we first introduce a numeric encoding relation representation, which can save much time on relation computation and storage on relation representation. Furthermore, to shrink the search space, we also propose a complement pruning strategy, which incorporates the utility upper bound with the relation. Finally, plentiful experiments implemented on both real-world and synthetic datasets verify that UIRMiner is an effective and efficient algorithm. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: Preprint. 11 figures, 5 tables

arXiv:2308.07767 [pdf, other]

Preliminary investigation of the short-term in situ performance of an automatic masker selection system

Authors: Bhan Lam, Zhen-Ting Ong, Kenneth Ooi, Wen-Hui Ong, Trevor Wong, Karn N. Watcharasupat, Woon-Seng Gan

Abstract: Soundscape augmentation or "masking" introduces wanted sounds into the acoustic environment to improve acoustic comfort. Usually, the masker selection and playback strategies are either arbitrary or based on simple rules (e.g. -3 dBA), which may lead to sub-optimal increment or even reduction in acoustic comfort for dynamic acoustic environments. To reduce ambiguity in the selection of maskers, an… ▽ More Soundscape augmentation or "masking" introduces wanted sounds into the acoustic environment to improve acoustic comfort. Usually, the masker selection and playback strategies are either arbitrary or based on simple rules (e.g. -3 dBA), which may lead to sub-optimal increment or even reduction in acoustic comfort for dynamic acoustic environments. To reduce ambiguity in the selection of maskers, an automatic masker selection system (AMSS) was recently developed. The AMSS uses a deep-learning model trained on a large-scale dataset of subjective responses to maximize the derived ISO pleasantness (ISO 12913-2). Hence, this study investigates the short-term in situ performance of the AMSS implemented in a gazebo in an urban park. Firstly, the predicted ISO pleasantness from the AMSS is evaluated in comparison to the in situ subjective evaluation scores. Secondly, the effect of various masker selection schemes on the perceived affective quality and appropriateness would be evaluated. In total, each participant evaluated 6 conditions: (1) ambient environment with no maskers; (2) AMSS; (3) bird and (4) water masker from prior art; (5) random selection from same pool of maskers used to train the AMSS; and (6) selection of best-performing maskers based on the analysis of the dataset used to train the AMSS. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: paper submitted to the 52nd International Congress and Exposition on Noise Control Engineering held in Chiba, Greater Tokyo, Japan, on 20-23 August 2023 (Inter-Noise 2023)

ACM Class: J.2; J.4

arXiv:2308.03684 [pdf, other]

Active Noise Control based on the Momentum Multichannel Normalized Filtered-x Least Mean Square Algorithm

Authors: Dongyuan Shi, Woon-Seng Gan, Bhan Lam, Shulin Wen, Xiaoyi Shen

Abstract: Multichannel active noise control (MCANC) is widely utilized to achieve significant noise cancellation area in the complicated acoustic field. Meanwhile, the filter-x least mean square (FxLMS) algorithm gradually becomes the benchmark solution for the implementation of MCANC due to its low computational complexity. However, its slow convergence speed more or less undermines the performance of deal… ▽ More Multichannel active noise control (MCANC) is widely utilized to achieve significant noise cancellation area in the complicated acoustic field. Meanwhile, the filter-x least mean square (FxLMS) algorithm gradually becomes the benchmark solution for the implementation of MCANC due to its low computational complexity. However, its slow convergence speed more or less undermines the performance of dealing with quickly varying disturbances, such as piling noise. Furthermore, the noise power variation also deteriorates the robustness of the algorithm when it adopts the fixed step size. To solve these issues, we integrated the normalized multichannel FxLMS with the momentum method, which hence, effectively avoids the interference of the primary noise power and accelerates the convergence of the algorithm. To validate its effectiveness, we deployed this algorithm in a multichannel noise control window to control the real machine noise. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: Conference: INTER-NOISE and NOISE-CON Congress and Conference Proceedings 2020 At Korea Volume: 261

arXiv:2307.05533 [pdf, other]

doi 10.1016/j.scs.2023.104763

Anti-noise window: Subjective perception of active noise reduction and effect of informational masking

Authors: Bhan Lam, Kelvin Chee Quan Lim, Kenneth Ooi, Zhen-Ting Ong, Dongyuan Shi, Woon-Seng Gan

Abstract: Reviving natural ventilation (NV) for urban sustainability presents challenges for indoor acoustic comfort. Active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. However, these approaches are not commonly integrated or evaluated from a perceptual standpoint. This study examines… ▽ More Reviving natural ventilation (NV) for urban sustainability presents challenges for indoor acoustic comfort. Active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. However, these approaches are not commonly integrated or evaluated from a perceptual standpoint. This study examines the perceptual and objective aspects of an active-noise-control (ANC)-based "anti-noise" window (ANW) and its integration with informational masking (IM) in a model bedroom. Forty participants assessed the ANW in a three-way interaction involving noise types (traffic, train, and aircraft), maskers (bird, water), and ANC (on, off). The evaluation focused on perceived annoyance (PAY; ISO/TS 15666), perceived affective quality (ISO/TS 12913-2), loudness (PLN), and included an open-ended qualitative assessment. Despite minimal objective reduction in decibel-based indicators and a slight increase in psychoacoustic sharpness, the ANW alone demonstrated significant reductions in PAY and PLN, as well as an improvement in ISO pleasantness across all noise types. The addition of maskers generally enhanced overall acoustic comfort, although water masking led to increased PLN. Furthermore, the combination of ANC with maskers showed interaction effects, with both maskers significantly reducing PAY compared to ANC alone. △ Less

Submitted 8 July, 2023; originally announced July 2023.

Comments: Accepted manuscript submitted to Sustainable Cities and Society

Journal ref: Sustain. Cities Soc., 104763, 2023

arXiv:2306.06470 [pdf, other]

TALENT: Targeted Mining of Non-overlap** Sequential Patterns

Authors: Zefeng Chen, Wensheng Gan, Gengsen Huang, Zhenlian Qi, Yan Li, Philip S. Yu

Abstract: With the widespread application of efficient pattern mining algorithms, sequential patterns that allow gap constraints have become a valuable tool to discover knowledge from biological data such as DNA and protein sequences. Among all kinds of gap-constrained mining, non-overlap** sequence mining can mine interesting patterns and satisfy the anti-monotonic property (the Apriori property). Howeve… ▽ More With the widespread application of efficient pattern mining algorithms, sequential patterns that allow gap constraints have become a valuable tool to discover knowledge from biological data such as DNA and protein sequences. Among all kinds of gap-constrained mining, non-overlap** sequence mining can mine interesting patterns and satisfy the anti-monotonic property (the Apriori property). However, existing algorithms do not search for targeted sequential patterns, resulting in unnecessary and redundant pattern generation. Targeted pattern mining can not only mine patterns that are more interesting to users but also reduce the unnecessary redundant sequence generated, which can greatly avoid irrelevant computation. In this paper, we define and formalize the problem of targeted non-overlap** sequential pattern mining and propose an algorithm named TALENT (TArgeted mining of sequentiaL pattErN with consTraints). Two search methods including breadth-first and depth-first searching are designed to troubleshoot the generation of patterns. Furthermore, several pruning strategies to reduce the reading of sequences and items in the data and terminate redundant pattern extensions are presented. Finally, we select a series of datasets with different characteristics and conduct extensive experiments to compare the TALENT algorithm with the existing algorithms for mining non-overlap** sequential patterns. The experimental results demonstrate that the proposed targeted mining algorithm, TALENT, has excellent mining efficiency and can deal efficiently with many different query settings. △ Less

Submitted 10 June, 2023; originally announced June 2023.

Comments: Preprint. 9 figures, 5 tables

arXiv:2305.12672 [pdf, other]

Block Coordinate Plug-and-Play Methods for Blind Inverse Problems

Authors: Weijie Gan, Shirin Shoushtari, Yuyang Hu, Jiaming Liu, Hongyu An, Ulugbek S. Kamilov

Abstract: Plug-and-play (PnP) prior is a well-known class of methods for solving imaging inverse problems by computing fixed-points of operators combining physical measurement models and learned image denoisers. While PnP methods have been extensively used for image recovery with known measurement operators, there is little work on PnP for solving blind inverse problems. We address this gap by presenting a… ▽ More Plug-and-play (PnP) prior is a well-known class of methods for solving imaging inverse problems by computing fixed-points of operators combining physical measurement models and learned image denoisers. While PnP methods have been extensively used for image recovery with known measurement operators, there is little work on PnP for solving blind inverse problems. We address this gap by presenting a new block-coordinate PnP (BC-PnP) method that efficiently solves this joint estimation problem by introducing learned denoisers as priors on both the unknown image and the unknown measurement operator. We present a new convergence theory for BC-PnP compatible with blind inverse problems by considering nonconvex data-fidelity terms and expansive denoisers. Our theory analyzes the convergence of BC-PnP to a stationary point of an implicit function associated with an approximate minimum mean-squared error (MMSE) denoiser. We numerically validate our method on two blind inverse problems: automatic coil sensitivity estimation in magnetic resonance imaging (MRI) and blind image deblurring. Our results show that BC-PnP provides an efficient and principled framework for using denoisers as PnP priors for jointly estimating measurement operators and images. △ Less

Submitted 26 October, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

arXiv:2304.13931 [pdf, other]

Open Metaverse: Issues, Evolution, and Future

Authors: Zefeng Chen, Wensheng Gan, Jiayi Sun, Jiayang Wu, Philip S. Yu

Abstract: With the evolution of content on the web and the Internet, there is a need for cyberspace that can be used to work, live, and play in digital worlds regardless of geography. The Metaverse provides the possibility of future Internet and represents a future trend. In the future, the Metaverse will be a space where the real and the virtual are combined. In this article, we have a comprehensive survey… ▽ More With the evolution of content on the web and the Internet, there is a need for cyberspace that can be used to work, live, and play in digital worlds regardless of geography. The Metaverse provides the possibility of future Internet and represents a future trend. In the future, the Metaverse will be a space where the real and the virtual are combined. In this article, we have a comprehensive survey of the compelling Metaverse. We introduce computer technology, the history of the Internet, and the promise of the Metaverse as the next generation of the Internet. In addition, we briefly introduce the related concepts of the Metaverse, including novel terms like trusted Metaverse, human-intelligence Metaverse, personalized Metaverse, AI-enabled Metaverse, Metaverse-as-a-service, etc. Moreover, we present the challenges of the Metaverse such as limited resources and ethical issues. We also present Metaverse's promising directions, including lightweight Metaverse and autonomous Metaverse. We hope this survey will provide some helpful prospects and insightful directions about the Metaverse to related developments. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: Preprint. 6 figures, 2 tables

arXiv:2304.11947 [pdf, other]

Towards Top-$K$ Non-Overlap** Sequential Patterns

Authors: Zefeng Chen, Wensheng Gan, Gengsen Huang, Yan Li, Zhenlian Qi

Abstract: Sequential pattern mining (SPM) has excellent prospects and application spaces and has been widely used in different fields. The non-overlap** SPM, as one of the data mining techniques, has been used to discover patterns that have requirements for gap constraints in some specific mining tasks, such as bio-data mining. And for the non-overlap** sequential patterns with gap constraints, the Nett… ▽ More Sequential pattern mining (SPM) has excellent prospects and application spaces and has been widely used in different fields. The non-overlap** SPM, as one of the data mining techniques, has been used to discover patterns that have requirements for gap constraints in some specific mining tasks, such as bio-data mining. And for the non-overlap** sequential patterns with gap constraints, the Nettree structure has been proposed to efficiently compute the support of the patterns. For pattern mining, users usually need to consider the threshold of minimum support (\textit{minsup}). This is especially difficult in the case of large databases. Although some existing algorithms can mine the top-$k$ patterns, they are approximate algorithms with fixed lengths. In this paper, a precise algorithm for mining \underline{T}op-$k$ \underline{N}on-\underline{O}verlap** \underline{S}equential \underline{P}atterns (TNOSP) is proposed. The top-$k$ solution of SPM is an effective way to discover the most frequent non-overlap** sequential patterns without having to set the \textit{minsup}. As a novel pattern mining algorithm, TNOSP can precisely search the top-$k$ patterns of non-overlap** sequences with different gap constraints. We further propose a pruning strategy named \underline{Q}ueue \underline{M}eta \underline{S}et \underline{P}runing (QMSP) to improve TNOSP's performance. TNOSP can reduce redundancy in non-overlap** sequential mining and has better performance in mining precise non-overlap** sequential patterns. The experimental results and comparisons on several datasets have shown that TNOSP outperformed the existing algorithms in terms of precision, efficiency, and scalability. △ Less

Submitted 24 April, 2023; originally announced April 2023.

Comments: Preprint. 5 figures, 5 tables

arXiv:2304.06632 [pdf, other]

AI-Generated Content (AIGC): A Survey

Authors: Jiayang Wu, Wensheng Gan, Zefeng Chen, Shicheng Wan, Hong Lin

Abstract: To address the challenges of digital intelligence in the digital economy, artificial intelligence-generated content (AIGC) has emerged. AIGC uses artificial intelligence to assist or replace manual content generation by generating content based on user-inputted keywords or requirements. The development of large model algorithms has significantly strengthened the capabilities of AIGC, which makes A… ▽ More To address the challenges of digital intelligence in the digital economy, artificial intelligence-generated content (AIGC) has emerged. AIGC uses artificial intelligence to assist or replace manual content generation by generating content based on user-inputted keywords or requirements. The development of large model algorithms has significantly strengthened the capabilities of AIGC, which makes AIGC products a promising generative tool and adds convenience to our lives. As an upstream technology, AIGC has unlimited potential to support different downstream applications. It is important to analyze AIGC's current capabilities and shortcomings to understand how it can be best utilized in future applications. Therefore, this paper provides an extensive overview of AIGC, covering its definition, essential conditions, cutting-edge capabilities, and advanced features. Moreover, it discusses the benefits of large-scale pre-trained models and the industrial chain of AIGC. Furthermore, the article explores the distinctions between auxiliary generation and automatic generation within AIGC, providing examples of text generation. The paper also examines the potential integration of AIGC with the Metaverse. Lastly, the article highlights existing issues and suggests some future directions for application. △ Less

Submitted 25 March, 2023; originally announced April 2023.

Comments: Preprint. 14 figures, 4 tables

arXiv:2304.06111 [pdf, other]

Web3: The Next Internet Revolution

Authors: Shicheng Wan, Hong Lin, Wensheng Gan, Jiahui Chen, Philip S. Yu

Abstract: Since the first appearance of the World Wide Web, people more rely on the Web for their cyber social activities. The second phase of World Wide Web, named Web 2.0, has been extensively attracting worldwide people that participate in building and enjoying the virtual world. Nowadays, the next internet revolution: Web3 is going to open new opportunities for traditional social models. The decentraliz… ▽ More Since the first appearance of the World Wide Web, people more rely on the Web for their cyber social activities. The second phase of World Wide Web, named Web 2.0, has been extensively attracting worldwide people that participate in building and enjoying the virtual world. Nowadays, the next internet revolution: Web3 is going to open new opportunities for traditional social models. The decentralization property of Web3 is capable of breaking the monopoly of the internet companies. Moreover, Web3 will lead a paradigm shift from the Web as a publishing medium to a medium of interaction and participation. This change will deeply transform the relations among users and platforms, forces and relations of production, and the global economy. Therefore, it is necessary that we technically, practically, and more broadly take an overview of Web3. In this paper, we present a comprehensive survey of Web3, with a focus on current technologies, challenges, opportunities, and outlook. This article first introduces several major technologies of Web3. Then, we illustrate the type of Web3 applications in detail. Blockchain and smart contracts ensure that decentralized organizations will be less trusted and more truthful than that centralized organizations. Decentralized finance will be global, and open with financial inclusiveness for unbanked people. This paper also discusses the relationship between the Metaverse and Web3, as well as the differences and similarities between Web 3.0 and Web3. Inspired by the Maslow's hierarchy of needs theory, we further conduct a novel hierarchy of needs theory within Web3. Finally, several worthwhile future research directions of Web3 are discussed. △ Less

Submitted 22 March, 2023; originally announced April 2023.

Comments: Preprint. 5 figures, 2 tables

arXiv:2304.06032 [pdf, other]

Web 3.0: The Future of Internet

Authors: Wensheng Gan, Zhenqiang Ye, Shicheng Wan, Philip S. Yu

Abstract: With the rapid growth of the Internet, human daily life has become deeply bound to the Internet. To take advantage of massive amounts of data and information on the internet, the Web architecture is continuously being reinvented and upgraded. From the static informative characteristics of Web 1.0 to the dynamic interactive features of Web 2.0, scholars and engineers have worked hard to make the in… ▽ More With the rapid growth of the Internet, human daily life has become deeply bound to the Internet. To take advantage of massive amounts of data and information on the internet, the Web architecture is continuously being reinvented and upgraded. From the static informative characteristics of Web 1.0 to the dynamic interactive features of Web 2.0, scholars and engineers have worked hard to make the internet world more open, inclusive, and equal. Indeed, the next generation of Web evolution (i.e., Web 3.0) is already coming and sha** our lives. Web 3.0 is a decentralized Web architecture that is more intelligent and safer than before. The risks and ruin posed by monopolists or criminals will be greatly reduced by a complete reconstruction of the Internet and IT infrastructure. In a word, Web 3.0 is capable of addressing web data ownership according to distributed technology. It will optimize the internet world from the perspectives of economy, culture, and technology. Then it promotes novel content production methods, organizational structures, and economic forms. However, Web 3.0 is not mature and is now being disputed. Herein, this paper presents a comprehensive survey of Web 3.0, with a focus on current technologies, challenges, opportunities, and outlook. This article first introduces a brief overview of the history of World Wide Web as well as several differences among Web 1.0, Web 2.0, Web 3.0, and Web3. Then, some technical implementations of Web 3.0 are illustrated in detail. We discuss the revolution and benefits that Web 3.0 brings. Finally, we explore several challenges and issues in this promising area. △ Less

Submitted 23 March, 2023; originally announced April 2023.

Comments: ACM Web Conference 2023

arXiv:2304.03679 [pdf, other]

T2Ranking: A large-scale Chinese Benchmark for Passage Ranking

Authors: Xiaohui Xie, Qian Dong, Bingning Wang, Feiyang Lv, Ting Yao, Weinan Gan, Zhi**g Wu, Xiangsheng Li, Haitao Li, Yiqun Liu, ** Ma

Abstract: Passage ranking involves two stages: passage retrieval and passage re-ranking, which are important and challenging topics for both academics and industries in the area of Information Retrieval (IR). However, the commonly-used datasets for passage ranking usually focus on the English language. For non-English scenarios, such as Chinese, the existing datasets are limited in terms of data scale, fine… ▽ More Passage ranking involves two stages: passage retrieval and passage re-ranking, which are important and challenging topics for both academics and industries in the area of Information Retrieval (IR). However, the commonly-used datasets for passage ranking usually focus on the English language. For non-English scenarios, such as Chinese, the existing datasets are limited in terms of data scale, fine-grained relevance annotation and false negative issues. To address this problem, we introduce T2Ranking, a large-scale Chinese benchmark for passage ranking. T2Ranking comprises more than 300K queries and over 2M unique passages from real-world search engines. Expert annotators are recruited to provide 4-level graded relevance scores (fine-grained) for query-passage pairs instead of binary relevance judgments (coarse-grained). To ease the false negative issues, more passages with higher diversities are considered when performing relevance annotations, especially in the test set, to ensure a more accurate evaluation. Apart from the textual query and passage data, other auxiliary resources are also provided, such as query types and XML files of documents which passages are generated from, to facilitate further studies. To evaluate the dataset, commonly used ranking models are implemented and tested on T2Ranking as baselines. The experimental results show that T2Ranking is challenging and there is still scope for improvement. The full data and all codes are available at https://github.com/THUIR/T2Ranking/ △ Less

Submitted 7 April, 2023; originally announced April 2023.

Comments: This Resource paper has been accepted by the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)

arXiv:2304.01689 [pdf, other]

Privacy-Preserving Federated Discovery of DNA Motifs with Differential Privacy

Authors: Yao Chen, Wensheng Gan, Gengsen Huang, Yongdong Wu, Philip S. Yu

Abstract: DNA motif discovery is an important issue in gene research, which aims to identify transcription factor binding sites (i.e., motifs) in DNA sequences to reveal the mechanisms that regulate gene expression. However, the phenomenon of data silos and the problem of privacy leakage have seriously hindered the development of DNA motif discovery. On the one hand, the phenomenon of data silos makes data… ▽ More DNA motif discovery is an important issue in gene research, which aims to identify transcription factor binding sites (i.e., motifs) in DNA sequences to reveal the mechanisms that regulate gene expression. However, the phenomenon of data silos and the problem of privacy leakage have seriously hindered the development of DNA motif discovery. On the one hand, the phenomenon of data silos makes data collection difficult. On the other hand, the collection and use of DNA data become complicated and difficult because DNA is sensitive private information. In this context, how discovering DNA motifs under the premise of ensuring privacy and security and alleviating data silos has become a very important issue. Therefore, this paper proposes a novel method, namely DP-FLMD, to address this problem. Note that this is the first application of federated learning to the field of genetics research. The federated learning technique is used to solve the problem of data silos. It has the advantage of enabling multiple participants to train models together and providing privacy protection services. To address the challenges of federated learning in terms of communication costs, this paper applies a sampling method and a strategy for reducing communication costs to DP-FLMD. In addition, differential privacy, a privacy protection technique with rigorous mathematical proof, is also applied to DP-FLMD. Experiments on the DNA datasets show that DP-FLMD has high mining accuracy and runtime efficiency, and the performance of the algorithm is affected by some parameters. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: Preprint. 7 figures, 1 table

arXiv:2303.17987 [pdf, other]

Federated Learning for Metaverse: A Survey

Authors: Yao Chen, Shan Huang, Wensheng Gan, Gengsen Huang, Yongdong Wu

Abstract: The metaverse, which is at the stage of innovation and exploration, faces the dilemma of data collection and the problem of private data leakage in the process of development. This can seriously hinder the widespread deployment of the metaverse. Fortunately, federated learning (FL) is a solution to the above problems. FL is a distributed machine learning paradigm with privacy-preserving features d… ▽ More The metaverse, which is at the stage of innovation and exploration, faces the dilemma of data collection and the problem of private data leakage in the process of development. This can seriously hinder the widespread deployment of the metaverse. Fortunately, federated learning (FL) is a solution to the above problems. FL is a distributed machine learning paradigm with privacy-preserving features designed for a large number of edge devices. Federated learning for metaverse (FL4M) will be a powerful tool. Because FL allows edge devices to participate in training tasks locally using their own data, computational power, and model-building capabilities. Applying FL to the metaverse not only protects the data privacy of participants but also reduces the need for high computing power and high memory on servers. Until now, there have been many studies about FL and the metaverse, respectively. In this paper, we review some of the early advances of FL4M, which will be a research direction with unlimited development potential. We first introduce the concepts of metaverse and FL, respectively. Besides, we discuss the convergence of key metaverse technologies and FL in detail, such as big data, communication technology, the Internet of Things, edge computing, blockchain, and extended reality. Finally, we discuss some key challenges and promising directions of FL4M in detail. In summary, we hope that our up-to-date brief survey can help people better understand FL4M and build a fair, open, and secure metaverse. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: ACM Web Conference 2023

arXiv:2303.14510 [pdf, other]

Targeted Mining of Top-k High Utility Itemsets

Authors: Shan Huang, Wensheng Gan, **bao Miao, Xuming Han, Philippe Fournier-Viger

Abstract: Finding high-importance patterns in data is an emerging data mining task known as High-utility itemset mining (HUIM). Given a minimum utility threshold, a HUIM algorithm extracts all the high-utility itemsets (HUIs) whose utility values are not less than the threshold. This can reveal a wealth of useful information, but the precise needs of users are not well taken into account. In particular, use… ▽ More Finding high-importance patterns in data is an emerging data mining task known as High-utility itemset mining (HUIM). Given a minimum utility threshold, a HUIM algorithm extracts all the high-utility itemsets (HUIs) whose utility values are not less than the threshold. This can reveal a wealth of useful information, but the precise needs of users are not well taken into account. In particular, users often want to focus on patterns that have some specific items rather than find all patterns. To overcome that difficulty, targeted mining has emerged, focusing on user preferences, but only preliminary work has been conducted. For example, the targeted high-utility itemset querying algorithm (TargetUM) was proposed, which uses a lexicographic tree to query itemsets containing a target pattern. However, selecting the minimum utility threshold is difficult when the user is not familiar with the processed database. As a solution, this paper formulates the task of targeted mining of the top-k high-utility itemsets and proposes an efficient algorithm called TMKU based on the TargetUM algorithm to discover the top-k target high-utility itemsets (top-k THUIs). At the same time, several pruning strategies are used to reduce memory consumption and execution time. Extensive experiments show that the proposed TMKU algorithm has good performance on real and synthetic datasets. △ Less

Submitted 25 March, 2023; originally announced March 2023.

Comments: Preprint. 5 figures, 5 tables

arXiv:2303.11123 [pdf, other]

The Human-Centric Metaverse: A Survey

Authors: Riyan Yang, Lin Li, Wensheng Gan, Zefeng Chen, Zhenlian Qi

Abstract: In the era of the Web of Things, the Metaverse is expected to be the landing site for the next generation of the Internet, resulting in the increased popularity of related technologies and applications in recent years and gradually becoming the focus of Internet research. The Metaverse, as a link between the real and virtual worlds, can provide users with immersive experiences. As the concept of t… ▽ More In the era of the Web of Things, the Metaverse is expected to be the landing site for the next generation of the Internet, resulting in the increased popularity of related technologies and applications in recent years and gradually becoming the focus of Internet research. The Metaverse, as a link between the real and virtual worlds, can provide users with immersive experiences. As the concept of the Metaverse grows in popularity, many scholars and developers begin to focus on the Metaverse's ethics and core. This paper argues that the Metaverse should be centered on humans. That is, humans constitute the majority of the Metaverse. As a result, we begin this paper by introducing the Metaverse's origins, characteristics, related technologies, and the concept of the human-centric Metaverse (HCM). Second, we discuss the manifestation of human-centric in the Metaverse. Finally, we discuss some current issues in the construction of HCM. In this paper, we provide a detailed review of the applications of human-centric technologies in the Metaverse, as well as the relevant HCM application scenarios. We hope that this paper can provide researchers and developers with some directions and ideas for human-centric Metaverse construction. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: ACM Web Conference 2023

arXiv:2303.10076 [pdf, other]

A Simple Framework for 3D Occupancy Estimation in Autonomous Driving

Authors: Wanshui Gan, Ningkai Mo, Hongbin Xu, Naoto Yokoya

Abstract: The task of estimating 3D occupancy from surrounding-view images is an exciting development in the field of autonomous driving, following the success of Bird's Eye View (BEV) perception. This task provides crucial 3D attributes of the driving environment, enhancing the overall understanding and perception of the surrounding space. In this work, we present a simple framework for 3D occupancy estima… ▽ More The task of estimating 3D occupancy from surrounding-view images is an exciting development in the field of autonomous driving, following the success of Bird's Eye View (BEV) perception. This task provides crucial 3D attributes of the driving environment, enhancing the overall understanding and perception of the surrounding space. In this work, we present a simple framework for 3D occupancy estimation, which is a CNN-based framework designed to reveal several key factors for 3D occupancy estimation, such as network design, optimization, and evaluation. In addition, we explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation and 3D reconstruction, which could advance the study of 3D perception in autonomous driving. For evaluation, we propose a simple sampling strategy to define the metric for occupancy evaluation, which is flexible for current public datasets. Moreover, we establish the benchmark in terms of the depth estimation metric, where we compare our proposed method with monocular depth estimation methods on the DDAD and Nuscenes datasets and achieve competitive performance. The relevant code will be updated in https://github.com/GANWANSHUI/SimpleOccupancy. △ Less

Submitted 16 November, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: 15 pages, 8 figures

arXiv:2303.08379 [pdf]

doi 10.1109/ICASSP49357.2023.10096536

Implementing Continuous HRTF Measurement in Near-Field

Authors: Ee-Leng Tan, Santi Peksi, Woon-Seng Gan

Abstract: Head-related transfer function (HRTF) is an essential component to create an immersive listening experience over headphones for virtual reality (VR) and augmented reality (AR) applications. Metaverse combines VR and AR to create immersive digital experiences, and users are very likely to interact with virtual objects in the near-field (NF). The HRTFs of such objects are highly individualized and d… ▽ More Head-related transfer function (HRTF) is an essential component to create an immersive listening experience over headphones for virtual reality (VR) and augmented reality (AR) applications. Metaverse combines VR and AR to create immersive digital experiences, and users are very likely to interact with virtual objects in the near-field (NF). The HRTFs of such objects are highly individualized and dependent on directions and distances. Hence, a significant number of HRTF measurements at different distances in the NF would be needed. Using conventional static stop-and-go HRTF measurement methods to acquire these measurements would be time-consuming and tedious for human listeners. In this paper, we propose a continuous measurement system targeted for the NF, and efficiently capturing HRTFs in the horizontal plane within 45 secs. Comparative experiments are performed on head and torso simulator (HATS) and human listeners to evaluate system consistency and robustness. △ Less

Submitted 15 June, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 5 pages, 9 figures, Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2303.08342 [pdf, other]

doi 10.1109/ICASSP49357.2023.10094866

Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-linked Inputs

Authors: Kenneth Ooi, Karn N. Watcharasupat, Bhan Lam, Zhen-Ting Ong, Woon-Seng Gan

Abstract: Autonomous soundscape augmentation systems typically use trained models to pick optimal maskers to effect a desired perceptual change. While acoustic information is paramount to such systems, contextual information, including participant demographics and the visual environment, also influences acoustic perception. Hence, we propose modular modifications to an existing attention-based deep neural n… ▽ More Autonomous soundscape augmentation systems typically use trained models to pick optimal maskers to effect a desired perceptual change. While acoustic information is paramount to such systems, contextual information, including participant demographics and the visual environment, also influences acoustic perception. Hence, we propose modular modifications to an existing attention-based deep neural network, to allow early, mid-level, and late feature fusion of participant-linked, visual, and acoustic features. Ablation studies on module configurations and corresponding fusion methods using the ARAUS dataset show that contextual features improve the model performance in a statistically significant manner on the normalized ISO Pleasantness, to a mean squared error of $0.1194\pm0.0012$ for the best-performing all-modality model, against $0.1217\pm0.0009$ for the audio-only model. Soundscape augmentation systems can thereby leverage multimodal inputs for improved performance. We also investigate the impact of individual participant-linked factors using trained models to illustrate improvements in model explainability. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: 5 pages, 2 figures. Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing

Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun. 2023, pp. 1-5

arXiv:2303.07820 [pdf, other]

Adaptive Rotated Convolution for Rotated Object Detection

Authors: Yifan Pu, Yiru Wang, Zhuofan Xia, Yizeng Han, Yulin Wang, Weihao Gan, Zidong Wang, Shiji Song, Gao Huang

Abstract: Rotated object detection aims to identify and locate objects in images with arbitrary orientation. In this scenario, the oriented directions of objects vary considerably across different images, while multiple orientations of objects exist within an image. This intrinsic characteristic makes it challenging for standard backbone networks to extract high-quality features of these arbitrarily orienta… ▽ More Rotated object detection aims to identify and locate objects in images with arbitrary orientation. In this scenario, the oriented directions of objects vary considerably across different images, while multiple orientations of objects exist within an image. This intrinsic characteristic makes it challenging for standard backbone networks to extract high-quality features of these arbitrarily orientated objects. In this paper, we present Adaptive Rotated Convolution (ARC) module to handle the aforementioned challenges. In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images, and an efficient conditional computation mechanism is introduced to accommodate the large orientation variations of objects within an image. The two designs work seamlessly in rotated object detection problem. Moreover, ARC can conveniently serve as a plug-and-play module in various vision backbones to boost their representation ability to detect oriented objects accurately. Experiments on commonly used benchmarks (DOTA and HRSC2016) demonstrate that equipped with our proposed ARC module in the backbone network, the performance of multiple popular oriented object detectors is significantly improved (\eg +3.03\% mAP on Rotated RetinaNet and +4.16\% on CFA). Combined with the highly competitive method Oriented R-CNN, the proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77\% mAP. Code is available at \url{https://github.com/LeapLabTHU/ARC}. △ Less

Submitted 21 September, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: ICCV 2023

arXiv:2303.05788 [pdf, other]

doi 10.1109/ICASSP49357.2023.10095205

Deep Generative Fixed-filter Active Noise Control

Authors: Zhengding Luo, Dongyuan Shi, Xiaoyi Shen, Junwei Ji, Woon-Seng Gan

Abstract: Due to the slow convergence and poor tracking ability, conventional LMS-based adaptive algorithms are less capable of handling dynamic noises. Selective fixed-filter active noise control (SFANC) can significantly reduce response time by selecting appropriate pre-trained control filters for different noises. Nonetheless, the limited number of pre-trained control filters may affect noise reduction p… ▽ More Due to the slow convergence and poor tracking ability, conventional LMS-based adaptive algorithms are less capable of handling dynamic noises. Selective fixed-filter active noise control (SFANC) can significantly reduce response time by selecting appropriate pre-trained control filters for different noises. Nonetheless, the limited number of pre-trained control filters may affect noise reduction performance, especially when the incoming noise differs much from the initial noises during pre-training. Therefore, a generative fixed-filter active noise control (GFANC) method is proposed in this paper to overcome the limitation. Based on deep learning and a perfect-reconstruction filter bank, the GFANC method only requires a few prior data (one pre-trained broadband control filter) to automatically generate suitable control filters for various noises. The efficacy of the GFANC method is demonstrated by numerical simulations on real-recorded noises. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023. Code will be available after publication

Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2212.14255 [pdf, other]

HUSP-SP: Faster Utility Mining on Sequence Data

Authors: Chunkai Zhang, Yuting Yang, Zilin Du, Wensheng Gan, Philip S. Yu

Abstract: High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM problem. Several algorithms have been proposed for addr… ▽ More High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM problem. Several algorithms have been proposed for addressing this problem, but they still cost a lot in terms of running time and memory usage. In this paper, to further solve this problem efficiently, we design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP). HUSP-SP utilizes the compact seq-array to store the necessary information in a sequence database. The seqPro structure is designed to efficiently calculate candidate patterns' utilities and upper bound values. Furthermore, a new upper bound on utility, namely tighter reduced sequence utility (TRSU) and two pruning strategies in search space, are utilized to improve the mining performance of HUSP-SP. Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability. △ Less

Submitted 29 December, 2022; originally announced December 2022.

Comments: ACM TKDD, 7 figures, 2 tables

arXiv:2212.10452 [pdf, other]

Towards Sequence Utility Maximization under Utility Occupancy Measure

Authors: Gengsen Huang, Wensheng Gan, Philip S. Yu

Abstract: The discovery of utility-driven patterns is a useful and difficult research topic. It can extract significant and interesting information from specific and varied databases, increasing the value of the services provided. In practice, the measure of utility is often used to demonstrate the importance, profit, or risk of an object or a pattern. In the database, although utility is a flexible criteri… ▽ More The discovery of utility-driven patterns is a useful and difficult research topic. It can extract significant and interesting information from specific and varied databases, increasing the value of the services provided. In practice, the measure of utility is often used to demonstrate the importance, profit, or risk of an object or a pattern. In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to the neglect of utility sharing. This leads to the derived patterns only exploring partial and local knowledge from a database. Utility occupancy is a recently proposed model that considers the problem of mining with high utility but low occupancy. However, existing studies are concentrated on itemsets that do not reveal the temporal relationship of object occurrences. Therefore, this paper towards sequence utility maximization. We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining (HUOSPM). Three dimensions, including frequency, utility, and occupancy, are comprehensively evaluated in HUOSPM. An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed. Furthermore, two data structures for storing related information about a pattern, Utility-Occupancy-List-Chain (UOL-Chain) and Utility-Occupancy-Table (UO-Table) with six associated upper bounds, are designed to improve efficiency. Empirical experiments are carried out to evaluate the novel algorithm's efficiency and effectiveness. The influence of different upper bounds and pruning strategies is analyzed and discussed. The comprehensive results suggest that the work of our algorithm is intelligent and effective. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: Preprint. 7 figures, 8 tables

Showing 1–50 of 163 results for author: Gan, W