Search | arXiv e-print repository

arXiv:2406.19832 [pdf, other]

doi 10.1145/3589334.3645542

MuGSI: Distilling GNNs with Multi-Granularity Structural Information for Graph Classification

Authors: Tianjun Yao, Jiaqi Sun, Defu Cao, Kun Zhang, Guangyi Chen

Abstract: Recent works have introduced GNN-to-MLP knowledge distillation (KD) frameworks to combine both GNN's superior performance and MLP's fast inference speed. However, existing KD frameworks are primarily designed for node classification within single graphs, leaving their applicability to graph classification largely unexplored. Two main challenges arise when extending KD for node classification to gr… ▽ More Recent works have introduced GNN-to-MLP knowledge distillation (KD) frameworks to combine both GNN's superior performance and MLP's fast inference speed. However, existing KD frameworks are primarily designed for node classification within single graphs, leaving their applicability to graph classification largely unexplored. Two main challenges arise when extending KD for node classification to graph classification: (1) The inherent sparsity of learning signals due to soft labels being generated at the graph level; (2) The limited expressiveness of student MLPs, especially in datasets with limited input feature spaces. To overcome these challenges, we introduce MuGSI, a novel KD framework that employs Multi-granularity Structural Information for graph classification. Specifically, we propose multi-granularity distillation loss in MuGSI to tackle the first challenge. This loss function is composed of three distinct components: graph-level distillation, subgraph-level distillation, and node-level distillation. Each component targets a specific granularity of the graph structure, ensuring a comprehensive transfer of structural knowledge from the teacher model to the student model. To tackle the second challenge, MuGSI proposes to incorporate a node feature augmentation component, thereby enhancing the expressiveness of the student MLPs and making them more capable learners. We perform extensive experiments across a variety of datasets and different teacher/student model architectures. The experiment results demonstrate the effectiveness, efficiency, and robustness of MuGSI. Codes are publicly available at: \textbf{\url{https://github.com/tianyao-aka/MuGSI}.} △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 12 pages, 4 figures. Accepted by TheWebConf2024

ACM Class: I.2.6

arXiv:2406.05033 [pdf, other]

Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes

Authors: Si Yi Meng, Antonio Orvieto, Daniel Yiming Cao, Christopher De Sa

Abstract: We study gradient descent (GD) dynamics on logistic regression problems with large, constant step sizes. For linearly-separable data, it is known that GD converges to the minimizer with arbitrarily large step sizes, a property which no longer holds when the problem is not separable. In fact, the behaviour can be much more complex -- a sequence of period-doubling bifurcations begins at the critical… ▽ More We study gradient descent (GD) dynamics on logistic regression problems with large, constant step sizes. For linearly-separable data, it is known that GD converges to the minimizer with arbitrarily large step sizes, a property which no longer holds when the problem is not separable. In fact, the behaviour can be much more complex -- a sequence of period-doubling bifurcations begins at the critical step size $2/λ$, where $λ$ is the largest eigenvalue of the Hessian at the solution. Using a smaller-than-critical step size guarantees convergence if initialized nearby the solution: but does this suffice globally? In one dimension, we show that a step size less than $1/λ$ suffices for global convergence. However, for all step sizes between $1/λ$ and the critical step size $2/λ$, one can construct a dataset such that GD converges to a stable cycle. In higher dimensions, this is actually possible even for step sizes less than $1/λ$. Our results show that although local convergence is guaranteed for all step sizes less than the critical step size, global convergence is not, and GD may instead converge to a cycle depending on the initialization. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.21056 [pdf, other]

An Organic Weed Control Prototype using Directed Energy and Deep Learning

Authors: Deng Cao, Hongbo Zhang, Rajveer Dhillon

Abstract: Organic weed control is a vital to improve crop yield with a sustainable approach. In this work, a directed energy weed control robot prototype specifically designed for organic farms is proposed. The robot uses a novel distributed array robot (DAR) unit for weed treatment. Soybean and corn databases are built to train deep learning neural nets to perform weed recognition. The initial deep learnin… ▽ More Organic weed control is a vital to improve crop yield with a sustainable approach. In this work, a directed energy weed control robot prototype specifically designed for organic farms is proposed. The robot uses a novel distributed array robot (DAR) unit for weed treatment. Soybean and corn databases are built to train deep learning neural nets to perform weed recognition. The initial deep learning neural nets show a high performance in classifying crops. The robot uses a patented directed energy plant eradication recipe that is completely organic and UV-C free, with no chemical damage or physical disturbance to the soil. The deep learning can classify 8 common weed species in a soybean field under natural environment with up to 98% accuracy. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2404.09403 [pdf, other]

Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning

Authors: Xiongye Xiao, Gengshuo Liu, Gaurav Gupta, Defu Cao, Shixuan Li, Yaxing Li, Tianqing Fang, Mingxi Cheng, Paul Bogdan

Abstract: Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world in autonomous systems and cyber-physical systems. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Different from most tra… ▽ More Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world in autonomous systems and cyber-physical systems. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Different from most traditional fusion models that incorporate all modalities identically in neural networks, our model designates a prime modality and regards the remaining modalities as detectors in the information pathway, serving to distill the flow of information. Our proposed perception model focuses on constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states. This approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of multimodal representation learning. Experimental evaluations on the MUStARD, CMU-MOSI, and CMU-MOSEI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks. Remarkably, on the CMU-MOSI dataset, ITHP surpasses human-level performance in the multimodal sentiment binary classification task across all evaluation metrics (i.e., Binary Accuracy, F1 Score, Mean Absolute Error, and Pearson Correlation). △ Less

Submitted 22 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: Accepted by ICLR 2024. Camera Ready Version

arXiv:2403.14409 [pdf, other]

Locating and Mitigating Gender Bias in Large Language Models

Authors: Yuchen Cai, Ding Cao, Rongxi Guo, Yaqin Wen, Guiquan Liu, Enhong Chen

Abstract: Large language models(LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. However, this process can inadvertently lead to these models acquiring biases and stereotypes prevalent in society. Prior research has typically tackled the issue of bias through a one-dimensional perspective, concentrating either on locating or mitigating it. This li… ▽ More Large language models(LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. However, this process can inadvertently lead to these models acquiring biases and stereotypes prevalent in society. Prior research has typically tackled the issue of bias through a one-dimensional perspective, concentrating either on locating or mitigating it. This limited perspective has created obstacles in facilitating research on bias to synergistically complement and progressively build upon one another. In this study, we integrate the processes of locating and mitigating bias within a unified framework. Initially, we use causal mediation analysis to trace the causal effects of different components' activation within a large language model. Building on this, we propose the LSDM (Least Square Debias Method), a knowledge-editing based method for mitigating gender bias in occupational pronouns, and compare it against two baselines on three gender bias datasets and seven knowledge competency test datasets. The experimental results indicate that the primary contributors to gender bias are the bottom MLP modules acting on the last token of occupational pronouns and the top attention module acting on the final word in the sentence. Furthermore, LSDM mitigates gender bias in the model more effectively than the other baselines, while fully preserving the model's capabilities in all other aspects. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 23 pages, 5 figures

arXiv:2403.14381 [pdf, other]

Editing Knowledge Representation of Language Model via Rephrased Prefix Prompts

Authors: Yuchen Cai, Ding Cao, Rongxi Guo, Yaqin Wen, Guiquan Liu, Enhong Chen

Abstract: Neural language models (LMs) have been extensively trained on vast corpora to store factual knowledge about various aspects of the world described in texts. Current technologies typically employ knowledge editing methods or specific prompts to modify LM outputs. However, existing knowledge editing methods are costly and inefficient, struggling to produce appropriate text. Additionally, prompt engi… ▽ More Neural language models (LMs) have been extensively trained on vast corpora to store factual knowledge about various aspects of the world described in texts. Current technologies typically employ knowledge editing methods or specific prompts to modify LM outputs. However, existing knowledge editing methods are costly and inefficient, struggling to produce appropriate text. Additionally, prompt engineering is opaque and requires significant effort to find suitable prompts. To address these issues, we introduce a new method called PSPEM (Prefix Soft Prompt Editing Method), that can be used for a lifetime with just one training. It resolves the inefficiencies and generalizability issues in knowledge editing methods and overcomes the opacity of prompt engineering by automatically seeking optimal soft prompts. Specifically, PSPEM utilizes a prompt encoder and an encoding converter to refine key information in prompts and uses prompt alignment techniques to guide model generation, ensuring text consistency and adherence to the intended structure and content, thereby maintaining an optimal balance between efficiency and accuracy. We have validated the effectiveness of PSPEM through knowledge editing and attribute inserting. On the COUNTERFACT dataset, PSPEM achieved nearly 100\% editing accuracy and demonstrated the highest level of fluency. We further analyzed the similarities between PSPEM and original prompts and their impact on the model's internals. The results indicate that PSPEM can serve as an alternative to original prompts, supporting the model in effective editing. △ Less

Submitted 11 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: 19pages,3figures

arXiv:2403.06419 [pdf, other]

Causal Multi-Label Feature Selection in Federated Setting

Authors: Yukun Song, Dayuan Cao, Jiali Miao, Shuai Yang, Kui Yu

Abstract: Multi-label feature selection serves as an effective mean for dealing with high-dimensional multi-label data. To achieve satisfactory performance, existing methods for multi-label feature selection often require the centralization of substantial data from multiple sources. However, in Federated setting, centralizing data from all sources and merging them into a single dataset is not feasible. To t… ▽ More Multi-label feature selection serves as an effective mean for dealing with high-dimensional multi-label data. To achieve satisfactory performance, existing methods for multi-label feature selection often require the centralization of substantial data from multiple sources. However, in Federated setting, centralizing data from all sources and merging them into a single dataset is not feasible. To tackle this issue, in this paper, we study a challenging problem of causal multi-label feature selection in federated setting and propose a Federated Causal Multi-label Feature Selection (FedCMFS) algorithm with three novel subroutines. Specifically, FedCMFS first uses the FedCFL subroutine that considers the correlations among label-label, label-feature, and feature-feature to learn the relevant features (candidate parents and children) of each class label while preserving data privacy without centralizing data. Second, FedCMFS employs the FedCFR subroutine to selectively recover the missed true relevant features. Finally, FedCMFS utilizes the FedCFC subroutine to remove false relevant features. The extensive experiments on 8 datasets have shown that FedCMFS is effect for causal multi-label feature selection in federated setting. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2402.18920 [pdf, other]

Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation

Authors: Dongliang Cao, Marvin Eisenberger, Nafie El Amrani, Daniel Cremers, Florian Bernard

Abstract: Although 3D shape matching and interpolation are highly interrelated, they are often studied separately and applied sequentially to relate different 3D shapes, thus resulting in sub-optimal performance. In this work we present a unified framework to predict both point-wise correspondences and shape interpolation between 3D shapes. To this end, we combine the deep functional map framework with clas… ▽ More Although 3D shape matching and interpolation are highly interrelated, they are often studied separately and applied sequentially to relate different 3D shapes, thus resulting in sub-optimal performance. In this work we present a unified framework to predict both point-wise correspondences and shape interpolation between 3D shapes. To this end, we combine the deep functional map framework with classical surface deformation models to map shapes in both spectral and spatial domains. On the one hand, by incorporating spatial maps, our method obtains more accurate and smooth point-wise correspondences compared to previous functional map methods for shape matching. On the other hand, by introducing spectral maps, our method gets rid of commonly used but computationally expensive geodesic distance constraints that are only valid for near-isometric shape deformations. Furthermore, we propose a novel test-time adaptation scheme to capture both pose-dominant and shape-dominant deformations. Using different challenging datasets, we demonstrate that our method outperforms previous state-of-the-art methods for both shape matching and interpolation, even compared to supervised approaches. △ Less

Submitted 27 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: accepted by CVPR2024

arXiv:2402.09099 [pdf, other]

Exploring Neuron Interactions and Emergence in LLMs: From the Multifractal Analysis Perspective

Authors: Xiongye Xiao, Chenyu Zhou, Heng **, Defu Cao, Yaxing Li, Yizhuo Zhou, Shixuan Li, Paul Bogdan

Abstract: Prior studies on the emergence in large models have primarily focused on how the functional capabilities of large language models (LLMs) scale with model size. Our research, however, transcends this traditional paradigm, aiming to deepen our understanding of the emergence within LLMs by placing a special emphasis not just on the model size but more significantly on the complex behavior of neuron i… ▽ More Prior studies on the emergence in large models have primarily focused on how the functional capabilities of large language models (LLMs) scale with model size. Our research, however, transcends this traditional paradigm, aiming to deepen our understanding of the emergence within LLMs by placing a special emphasis not just on the model size but more significantly on the complex behavior of neuron interactions during the training process. By introducing the concepts of "self-organization" and "multifractal analysis," we explore how neuron interactions dynamically evolve during training, leading to "emergence," mirroring the phenomenon in natural systems where simple micro-level interactions give rise to complex macro-level behaviors. To quantitatively analyze the continuously evolving interactions among neurons in large models during training, we propose the Neuron-based Multifractal Analysis (NeuroMFA). Utilizing NeuroMFA, we conduct a comprehensive examination of the emergent behavior in LLMs through the lens of both model size and training process, paving new avenues for research into the emergence in large models. △ Less

Submitted 21 March, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.06073 [pdf]

LightCAM: A Fast and Light Implementation of Context-Aware Masking based D-TDNN for Speaker Verification

Authors: Di Cao, Xianchen Wang, Junfeng Zhou, Jiakai Zhang, Yan**g Lei, Wenpeng Chen

Abstract: Traditional Time Delay Neural Networks (TDNN) have achieved state-of-the-art performance at the cost of high computational complexity and slower inference speed, making them difficult to implement in an industrial environment. The Densely Connected Time Delay Neural Network (D-TDNN) with Context Aware Masking (CAM) module has proven to be an efficient structure to reduce complexity while maintaini… ▽ More Traditional Time Delay Neural Networks (TDNN) have achieved state-of-the-art performance at the cost of high computational complexity and slower inference speed, making them difficult to implement in an industrial environment. The Densely Connected Time Delay Neural Network (D-TDNN) with Context Aware Masking (CAM) module has proven to be an efficient structure to reduce complexity while maintaining system performance. In this paper, we propose a fast and lightweight model, LightCAM, which further adopts a depthwise separable convolution module (DSM) and uses multi-scale feature aggregation (MFA) for feature fusion at different levels. Extensive experiments are conducted on VoxCeleb dataset, the comparative results show that it has achieved an EER of 0.83 and MinDCF of 0.0891 in VoxCeleb1-O, which outperforms the other mainstream speaker verification methods. In addition, complexity analysis further demonstrates that the proposed architecture has lower computational cost and faster inference speed. △ Less

Submitted 12 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.05359 [pdf, other]

Prompting with Divide-and-Conquer Program Makes Large Language Models Discerning to Hallucination and Deception

Authors: Yizhou Zhang, Lun Du, Defu Cao, Qiang Fu, Yan Liu

Abstract: Foundation models, such as Large language Models (LLMs), have attracted significant amount of interest due to their large number of applications. However, when handling tasks involving repetitive sub-tasks and/or deceptive contents, such as arithmetic calculation and article-level fake news detection, simple instructional prompts suffer from inaccurate responses. Existing works show that more comp… ▽ More Foundation models, such as Large language Models (LLMs), have attracted significant amount of interest due to their large number of applications. However, when handling tasks involving repetitive sub-tasks and/or deceptive contents, such as arithmetic calculation and article-level fake news detection, simple instructional prompts suffer from inaccurate responses. Existing works show that more complicated prompting strategies, such as Chain-of-Thoughts and Least-to-Most, can unlock LLM's powerful capacity in diverse areas. Recent researches reveal that simple divide-and-conquer prompting strategy, i.e. simply dividing the input sequence to multiple sub-inputs, can also substantially improve LLM's performance in some specific tasks such as misinformation detection. In this paper, we aim at examining the utility of divide-and-conquer prompting strategy and answer on which kind of tasks this strategy gets advantages. Specifically, we provide a theoretic analysis to divide-and-conquer prompting strategy and help us identify the specific tasks where DaC prompting can bring performance boost with theoretic guarantee. We then present two cases (large integer arithmetic and fact verification) where experimental results aligns with our theoretic analysis. △ Less

Submitted 24 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: Preprint

arXiv:2401.01918 [pdf, other]

Distilling Temporal Knowledge with Masked Feature Reconstruction for 3D Object Detection

Authors: Haowen Zheng, Dong Cao, **tao Xu, Rui Ai, Weihao Gu, Yang Yang, Yanyan Liang

Abstract: Striking a balance between precision and efficiency presents a prominent challenge in the bird's-eye-view (BEV) 3D object detection. Although previous camera-based BEV methods achieved remarkable performance by incorporating long-term temporal information, most of them still face the problem of low efficiency. One potential solution is knowledge distillation. Existing distillation methods only foc… ▽ More Striking a balance between precision and efficiency presents a prominent challenge in the bird's-eye-view (BEV) 3D object detection. Although previous camera-based BEV methods achieved remarkable performance by incorporating long-term temporal information, most of them still face the problem of low efficiency. One potential solution is knowledge distillation. Existing distillation methods only focus on reconstructing spatial features, while overlooking temporal knowledge. To this end, we propose TempDistiller, a Temporal knowledge Distiller, to acquire long-term memory from a teacher detector when provided with a limited number of frames. Specifically, a reconstruction target is formulated by integrating long-term temporal knowledge through self-attention operation applied to feature teachers. Subsequently, novel features are generated for masked student features via a generator. Ultimately, we utilize this reconstruction target to reconstruct the student features. In addition, we also explore temporal relational knowledge when inputting full frames for the student model. We verify the effectiveness of the proposed method on the nuScenes benchmark. The experimental results show our method obtain an enhancement of +1.6 mAP and +1.1 NDS compared to the baseline, a speed improvement of approximately 6 FPS after compressing temporal knowledge, and the most accurate velocity estimation. △ Less

Submitted 8 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.16797 [pdf, other]

Multi-Prompts Learning with Cross-Modal Alignment for Attribute-based Person Re-Identification

Authors: Ya**g Zhai, Yawen Zeng, Zhiyong Huang, Zheng Qin, Xin **, Da Cao

Abstract: The fine-grained attribute descriptions can significantly supplement the valuable semantic information for person image, which is vital to the success of person re-identification (ReID) task. However, current ReID algorithms typically failed to effectively leverage the rich contextual information available, primarily due to their reliance on simplistic and coarse utilization of image attributes. R… ▽ More The fine-grained attribute descriptions can significantly supplement the valuable semantic information for person image, which is vital to the success of person re-identification (ReID) task. However, current ReID algorithms typically failed to effectively leverage the rich contextual information available, primarily due to their reliance on simplistic and coarse utilization of image attributes. Recent advances in artificial intelligence generated content have made it possible to automatically generate plentiful fine-grained attribute descriptions and make full use of them. Thereby, this paper explores the potential of using the generated multiple person attributes as prompts in ReID tasks with off-the-shelf (large) models for more accurate retrieval results. To this end, we present a new framework called Multi-Prompts ReID (MP-ReID), based on prompt learning and language models, to fully dip fine attributes to assist ReID task. Specifically, MP-ReID first learns to hallucinate diverse, informative, and promptable sentences for describing the query images. This procedure includes (i) explicit prompts of which attributes a person has and furthermore (ii) implicit learnable prompts for adjusting/conditioning the criteria used towards this person identity matching. Explicit prompts are obtained by ensembling generation models, such as ChatGPT and VQA models. Moreover, an alignment module is designed to fuse multi-prompts (i.e., explicit and implicit ones) progressively and mitigate the cross-modal gap. Extensive experiments on the existing attribute-involved ReID datasets, namely, Market1501 and DukeMTMC-reID, demonstrate the effectiveness and rationality of the proposed MP-ReID solution. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: AAAI 2024

arXiv:2311.12799 [pdf, other]

A Fine-Grained Image Description Generation Method Based on Joint Objectives

Authors: Yifan Zhang, Chunzhen Lin, Donglin Cao, Dazhen Lin

Abstract: The goal of fine-grained image description generation techniques is to learn detailed information from images and simulate human-like descriptions that provide coherent and comprehensive textual details about the image content. Currently, most of these methods face two main challenges: description repetition and omission. Moreover, the existing evaluation metrics cannot clearly reflect the perform… ▽ More The goal of fine-grained image description generation techniques is to learn detailed information from images and simulate human-like descriptions that provide coherent and comprehensive textual details about the image content. Currently, most of these methods face two main challenges: description repetition and omission. Moreover, the existing evaluation metrics cannot clearly reflect the performance of models on these two issues. To address these challenges, we propose an innovative Fine-grained Image Description Generation model based on Joint Objectives. Furthermore, we introduce new object-based evaluation metrics to more intuitively assess the model's performance in handling description repetition and omission. This novel approach combines visual features at both the image level and object level to maximize their advantages and incorporates an object penalty mechanism to reduce description repetition. Experimental results demonstrate that our proposed method significantly improves the CIDEr evaluation metric, indicating its excellent performance in addressing description repetition and omission issues. △ Less

Submitted 1 September, 2023; originally announced November 2023.

arXiv:2311.06278 [pdf]

doi 10.32996/jmss

Boosting Stock Price Prediction with Anticipated Macro Policy Changes

Authors: Md Sabbirul Haque, Md Shahedul Amin, Jonayet Miah, Duc Minh Cao, Ashiqul Haque Ahmed

Abstract: Prediction of stock prices plays a significant role in aiding the decision-making of investors. Considering its importance, a growing literature has emerged trying to forecast stock prices with improved accuracy. In this study, we introduce an innovative approach for forecasting stock prices with greater accuracy. We incorporate external economic environment-related information along with stock pr… ▽ More Prediction of stock prices plays a significant role in aiding the decision-making of investors. Considering its importance, a growing literature has emerged trying to forecast stock prices with improved accuracy. In this study, we introduce an innovative approach for forecasting stock prices with greater accuracy. We incorporate external economic environment-related information along with stock prices. In our novel approach, we improve the performance of stock price prediction by taking into account variations due to future expected macroeconomic policy changes as investors adjust their current behavior ahead of time based on expected future macroeconomic policy changes. Furthermore, we incorporate macroeconomic variables along with historical stock prices to make predictions. Results from this strongly support the inclusion of future economic policy changes along with current macroeconomic information. We confirm the supremacy of our method over the conventional approach using several tree-based machine-learning algorithms. Results are strongly conclusive across various machine learning models. Our preferred model outperforms the conventional approach with an RMSE value of 1.61 compared to an RMSE value of 1.75 from the conventional approach. △ Less

Submitted 27 October, 2023; originally announced November 2023.

Journal ref: Journal of Mathematics and Statistics Studies, 4(3), 29-34 (2023)

arXiv:2311.00517 [pdf]

Improving Cardiovascular Disease Prediction Through Comparative Analysis of Machine Learning Models: A Case Study on Myocardial Infarction

Authors: Jonayet Miah, Duc M Ca, Md Abu Sayed, Ehsanur Rashid Lipu, Fuad Mahmud, S M Yasir Arafat

Abstract: Cardiovascular disease remains a leading cause of mortality in the contemporary world. Its association with smoking, elevated blood pressure, and cholesterol levels underscores the significance of these risk factors. This study addresses the challenge of predicting myocardial illness, a formidable task in medical research. Accurate predictions are pivotal for refining healthcare strategies. This i… ▽ More Cardiovascular disease remains a leading cause of mortality in the contemporary world. Its association with smoking, elevated blood pressure, and cholesterol levels underscores the significance of these risk factors. This study addresses the challenge of predicting myocardial illness, a formidable task in medical research. Accurate predictions are pivotal for refining healthcare strategies. This investigation conducts a comparative analysis of six distinct machine learning models: Logistic Regression, Support Vector Machine, Decision Tree, Bagging, XGBoost, and LightGBM. The attained outcomes exhibit promise, with accuracy rates as follows: Logistic Regression (81.00%), Support Vector Machine (75.01%), XGBoost (92.72%), LightGBM (90.60%), Decision Tree (82.30%), and Bagging (83.01%). Notably, XGBoost emerges as the top-performing model. These findings underscore its potential to enhance predictive precision for coronary infarction. As the prevalence of cardiovascular risk factors persists, incorporating advanced machine learning techniques holds the potential to refine proactive medical interventions. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Journal ref: 2023 15th International Conference on Innovations in Information Technology (IIT) - Track 2: Artificial Intelligence in Data Science

arXiv:2310.18237

Generative AI Model for Artistic Style Transfer Using Convolutional Neural Networks

Authors: Jonayet Miah, Duc M Cao, Md Abu Sayed, Md. Sabbirul Haque

Abstract: Artistic style transfer, a captivating application of generative artificial intelligence, involves fusing the content of one image with the artistic style of another to create unique visual compositions. This paper presents a comprehensive overview of a novel technique for style transfer using Convolutional Neural Networks (CNNs). By leveraging deep image representations learned by CNNs, we demons… ▽ More Artistic style transfer, a captivating application of generative artificial intelligence, involves fusing the content of one image with the artistic style of another to create unique visual compositions. This paper presents a comprehensive overview of a novel technique for style transfer using Convolutional Neural Networks (CNNs). By leveraging deep image representations learned by CNNs, we demonstrate how to separate and manipulate image content and style, enabling the synthesis of high-quality images that combine content and style in a harmonious manner. We describe the methodology, including content and style representations, loss computation, and optimization, and showcase experimental results highlighting the effectiveness and versatility of the approach across different styles and content △ Less

Submitted 30 October, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

Comments: Incorrectly Input

arXiv:2310.17729 [pdf]

Improving Traffic Density Forecasting in Intelligent Transportation Systems Using Gated Graph Neural Networks

Authors: Razib Hayat Khan, Jonayet Miah, S M Yasir Arafat, M M Mahbubul Syeed, Duc M Ca

Abstract: This study delves into the application of graph neural networks in the realm of traffic forecasting, a crucial facet of intelligent transportation systems. Accurate traffic predictions are vital for functions like trip planning, traffic control, and vehicle routing in such systems. Three prominent GNN architectures Graph Convolutional Networks (Graph Sample and Aggregation) and Gated Graph Neural… ▽ More This study delves into the application of graph neural networks in the realm of traffic forecasting, a crucial facet of intelligent transportation systems. Accurate traffic predictions are vital for functions like trip planning, traffic control, and vehicle routing in such systems. Three prominent GNN architectures Graph Convolutional Networks (Graph Sample and Aggregation) and Gated Graph Neural Networks are explored within the context of traffic prediction. Each architecture's methodology is thoroughly examined, including layer configurations, activation functions,and hyperparameters. The primary goal is to minimize prediction errors, with GGNNs emerging as the most effective choice among the three models. The research outlines outcomes for each architecture, elucidating their predictive performance through root mean squared error and mean absolute error (MAE). Hypothetical results reveal intriguing insights: GCNs display an RMSE of 9.10 and an MAE of 8.00, while GraphSAGE shows improvement with an RMSE of 8.3 and an MAE of 7.5. Gated Graph Neural Networks (GGNNs) exhibit the lowest RMSE at 9.15 and an impressive MAE of 7.1, positioning them as the frontrunner. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.17720 [pdf]

Advancing Brain Tumor Detection: A Thorough Investigation of CNNs, Clustering, and SoftMax Classification in the Analysis of MRI Images

Authors: Jonayet Miah, Duc M Cao, Md Abu Sayed3, Md Siam Taluckder, Md Sabbirul Haque, Fuad Mahmud

Abstract: Brain tumors pose a significant global health challenge due to their high prevalence and mortality rates across all age groups. Detecting brain tumors at an early stage is crucial for effective treatment and patient outcomes. This study presents a comprehensive investigation into the use of Convolutional Neural Networks (CNNs) for brain tumor detection using Magnetic Resonance Imaging (MRI) images… ▽ More Brain tumors pose a significant global health challenge due to their high prevalence and mortality rates across all age groups. Detecting brain tumors at an early stage is crucial for effective treatment and patient outcomes. This study presents a comprehensive investigation into the use of Convolutional Neural Networks (CNNs) for brain tumor detection using Magnetic Resonance Imaging (MRI) images. The dataset, consisting of MRI scans from both healthy individuals and patients with brain tumors, was processed and fed into the CNN architecture. The SoftMax Fully Connected layer was employed to classify the images, achieving an accuracy of 98%. To evaluate the CNN's performance, two other classifiers, Radial Basis Function (RBF) and Decision Tree (DT), were utilized, yielding accuracy rates of 98.24% and 95.64%, respectively. The study also introduced a clustering method for feature extraction, improving CNN's accuracy. Sensitivity, Specificity, and Precision were employed alongside accuracy to comprehensively evaluate the network's performance. Notably, the SoftMax classifier demonstrated the highest accuracy among the categorizers, achieving 99.52% accuracy on test data. The presented research contributes to the growing field of deep learning in medical image analysis. The combination of CNNs and MRI data offers a promising tool for accurately detecting brain tumors, with potential implications for early diagnosis and improved patient care. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Journal ref: JOIV : International Journal on Informatics Visualization, JOIV : Int. J. Inform. Visualization ISSN / E-ISSN 2549-9610 / 2549-9904, 2023

arXiv:2310.11420 [pdf, other]

Revisiting Map Relations for Unsupervised Non-Rigid Shape Matching

Authors: Dongliang Cao, Paul Roetzer, Florian Bernard

Abstract: We propose a novel unsupervised learning approach for non-rigid 3D shape matching. Our approach improves upon recent state-of-the art deep functional map methods and can be applied to a broad range of different challenging scenarios. Previous deep functional map methods mainly focus on feature extraction and aim exclusively at obtaining more expressive features for functional map computation. Howe… ▽ More We propose a novel unsupervised learning approach for non-rigid 3D shape matching. Our approach improves upon recent state-of-the art deep functional map methods and can be applied to a broad range of different challenging scenarios. Previous deep functional map methods mainly focus on feature extraction and aim exclusively at obtaining more expressive features for functional map computation. However, the importance of the functional map computation itself is often neglected and the relationship between the functional map and point-wise map is underexplored. In this paper, we systematically investigate the coupling relationship between the functional map from the functional map solver and the point-wise map based on feature similarity. To this end, we propose a self-adaptive functional map solver to adjust the functional map regularisation for different shape matching scenarios, together with a vertex-wise contrastive loss to obtain more discriminative features. Using different challenging datasets (including non-isometry, topological noise and partiality), we demonstrate that our method substantially outperforms previous state-of-the-art methods. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: 3DV 2024

arXiv:2310.08230 [pdf, other]

Fast Discrete Optimisation for Geometrically Consistent 3D Shape Matching

Authors: Paul Roetzer, Ahmed Abbas, Dongliang Cao, Florian Bernard, Paul Swoboda

Abstract: In this work we propose to combine the advantages of learning-based and combinatorial formalisms for 3D shape matching. While learning-based shape matching solutions lead to state-of-the-art matching performance, they do not ensure geometric consistency, so that obtained matchings are locally unsmooth. On the contrary, axiomatic methods allow to take geometric consistency into account by explicitl… ▽ More In this work we propose to combine the advantages of learning-based and combinatorial formalisms for 3D shape matching. While learning-based shape matching solutions lead to state-of-the-art matching performance, they do not ensure geometric consistency, so that obtained matchings are locally unsmooth. On the contrary, axiomatic methods allow to take geometric consistency into account by explicitly constraining the space of valid matchings. However, existing axiomatic formalisms are impractical since they do not scale to practically relevant problem sizes, or they require user input for the initialisation of non-convex optimisation problems. In this work we aim to close this gap by proposing a novel combinatorial solver that combines a unique set of favourable properties: our approach is (i) initialisation free, (ii) massively parallelisable powered by a quasi-Newton method, (iii) provides optimality gaps, and (iv) delivers decreased runtime and globally optimal results for many instances. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: Paul Roetzer and Ahmed Abbas contributed equally

arXiv:2310.07259 [pdf, other]

Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog

Authors: Haoyu Zhang, Meng Liu, Yaowei Wang, Da Cao, Weili Guan, Liqiang Nie

Abstract: In contrast to conventional visual question answering, video-grounded dialog necessitates a profound understanding of both dialog history and video content for accurate response generation. Despite commendable progress made by existing approaches, they still face the challenges of incrementally understanding complex dialog history and assimilating video information. In response to these challenges… ▽ More In contrast to conventional visual question answering, video-grounded dialog necessitates a profound understanding of both dialog history and video content for accurate response generation. Despite commendable progress made by existing approaches, they still face the challenges of incrementally understanding complex dialog history and assimilating video information. In response to these challenges, we present an iterative search and reasoning framework, which consists of a textual encoder, a visual encoder, and a generator. Specifically, we devise a path search and aggregation strategy in the textual encoder, mining core cues from dialog history that are pivotal to understanding the posed questions. Concurrently, our visual encoder harnesses an iterative reasoning network to extract and emphasize critical visual markers from videos, enhancing the depth of visual comprehension. Finally, we utilize the pre-trained GPT-2 model as our answer generator to decode the mined hidden clues into coherent and contextualized answers. Extensive experiments on three public datasets demonstrate the effectiveness and generalizability of our proposed framework. △ Less

Submitted 22 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.04948 [pdf, other]

TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting

Authors: Defu Cao, Furong Jia, Sercan O Arik, Tomas Pfister, Yixiang Zheng, Wen Ye, Yan Liu

Abstract: The past decade has witnessed significant advances in time series modeling with deep learning. While achieving state-of-the-art results, the best-performing architectures vary highly across applications and domains. Meanwhile, for natural language processing, the Generative Pre-trained Transformer (GPT) has demonstrated impressive performance via training one general-purpose model across various t… ▽ More The past decade has witnessed significant advances in time series modeling with deep learning. While achieving state-of-the-art results, the best-performing architectures vary highly across applications and domains. Meanwhile, for natural language processing, the Generative Pre-trained Transformer (GPT) has demonstrated impressive performance via training one general-purpose model across various textual datasets. It is intriguing to explore whether GPT-type architectures can be effective for time series, capturing the intrinsic dynamic attributes and leading to significant accuracy improvements. In this paper, we propose a novel framework, TEMPO, that can effectively learn time series representations. We focus on utilizing two essential inductive biases of the time series task for pre-trained models: (i) decomposition of the complex interaction between trend, seasonal and residual components; and (ii) introducing the design of prompts to facilitate distribution adaptation in different types of time series. TEMPO expands the capability for dynamically modeling real-world temporal phenomena from data within diverse domains. Our experiments demonstrate the superior performance of TEMPO over state-of-the-art methods on zero shot setting for a number of time series benchmark datasets. This performance gain is observed not only in scenarios involving previously unseen datasets but also in scenarios with multi-modal inputs. This compelling finding highlights TEMPO's potential to constitute a foundational model-building framework. △ Less

Submitted 2 April, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

Comments: Accepted by ICLR 2024. Camera Ready Version

arXiv:2309.15877

Neuro-Inspired Hierarchical Multimodal Learning

Authors: Xiongye Xiao, Gengshuo Liu, Gaurav Gupta, Defu Cao, Shixuan Li, Yaxing Li, Tianqing Fang, Mingxi Cheng, Paul Bogdan

Abstract: Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Distinct from most traditional fusion models that aim to incorporate all… ▽ More Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Distinct from most traditional fusion models that aim to incorporate all modalities as input, our model designates the prime modality as input, while the remaining modalities act as detectors in the information pathway. Our proposed perception model focuses on constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states. This approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of downstream tasks. Experimental evaluations on both the MUStARD and CMU-MOSI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks. △ Less

Submitted 23 April, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: I am requesting the withdrawal of this submission due to an inadvertent duplication. The paper was submitted twice under different IDs, which was not intentional. The other submission (arXiv:2404.09403) contains the most updated and comprehensive version of the paper, and I would like to retain that as the sole version on the platform

arXiv:2308.03865 [pdf, other]

DefCor-Net: Physics-Aware Ultrasound Deformation Correction

Authors: Zhongliang Jiang, Yue Zhou, Dongliang Cao, Nassir Navab

Abstract: The recovery of morphologically accurate anatomical images from deformed ones is challenging in ultrasound (US) image acquisition, but crucial to accurate and consistent diagnosis, particularly in the emerging field of computer-assisted diagnosis. This article presents a novel anatomy-aware deformation correction approach based on a coarse-to-fine, multi-scale deep neural network (DefCor-Net). To… ▽ More The recovery of morphologically accurate anatomical images from deformed ones is challenging in ultrasound (US) image acquisition, but crucial to accurate and consistent diagnosis, particularly in the emerging field of computer-assisted diagnosis. This article presents a novel anatomy-aware deformation correction approach based on a coarse-to-fine, multi-scale deep neural network (DefCor-Net). To achieve pixel-wise performance, DefCor-Net incorporates biomedical knowledge by estimating pixel-wise stiffness online using a U-shaped feature extractor. The deformation field is then computed using polynomial regression by integrating the measured force applied by the US probe. Based on real-time estimation of pixel-by-pixel tissue properties, the learning-based approach enables the potential for anatomy-aware deformation correction. To demonstrate the effectiveness of the proposed DefCor-Net, images recorded at multiple locations on forearms and upper arms of six volunteers are used to train and validate DefCor-Net. The results demonstrate that DefCor-Net can significantly improve the accuracy of deformation correction to recover the original geometry (Dice Coefficient: from $14.3\pm20.9$ to $82.6\pm12.1$ when the force is $6N$). △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: Accepted by MedIA. code is available

arXiv:2306.05257 [pdf, other]

doi 10.1093/bib/bbad235

Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction

Authors: Xuan Lin, Lichang Dai, Yafang Zhou, Zu-Guo Yu, Wen Zhang, Jian-Yu Shi, Dong-Sheng Cao, Li Zeng, Haowen Chen, Bosheng Song, Philip S. Yu, Xiangxiang Zeng

Abstract: Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction… ▽ More Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction through traditional clinical trials and experiments is an expensive and time-consuming process. To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources, and the design of computational methods. This review summarizes chemical structure based, network based, NLP based and hybrid methods, providing an updated and accessible guide to the broad researchers and development community with different domain knowledge. We introduce widely-used molecular representation and describe the theoretical frameworks of graph neural network models for representing molecular structures. We present the advantages and disadvantages of deep and graph learning methods by performing comparative experiments. We discuss the potential technical challenges and highlight future directions of deep and graph learning models for accelerating DDIs prediction. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: Accepted by Briefings in Bioinformatics

arXiv:2306.01980 [pdf, other]

doi 10.1109/TSMC.2023.3283021.

Milestones in Autonomous Driving and Intelligent Vehicles Part II: Perception and Planning

Authors: Long Chen, Siyu Teng, Bai Li, Xiaoxiang Na, Yuchen Li, Zixuan Li, **jun Wang, Dongpu Cao, Nanning Zheng, Fei-Yue Wang

Abstract: Growing interest in autonomous driving (AD) and intelligent vehicles (IVs) is fueled by their promise for enhanced safety, efficiency, and economic benefits. While previous surveys have captured progress in this field, a comprehensive and forward-looking summary is needed. Our work fills this gap through three distinct articles. The first part, a "Survey of Surveys" (SoS), outlines the history, su… ▽ More Growing interest in autonomous driving (AD) and intelligent vehicles (IVs) is fueled by their promise for enhanced safety, efficiency, and economic benefits. While previous surveys have captured progress in this field, a comprehensive and forward-looking summary is needed. Our work fills this gap through three distinct articles. The first part, a "Survey of Surveys" (SoS), outlines the history, surveys, ethics, and future directions of AD and IV technologies. The second part, "Milestones in Autonomous Driving and Intelligent Vehicles Part I: Control, Computing System Design, Communication, HD Map, Testing, and Human Behaviors" delves into the development of control, computing system, communication, HD map, testing, and human behaviors in IVs. This part, the third part, reviews perception and planning in the context of IVs. Aiming to provide a comprehensive overview of the latest advancements in AD and IVs, this work caters to both newcomers and seasoned researchers. By integrating the SoS and Part I, we offer unique insights and strive to serve as a bridge between past achievements and future possibilities in this dynamic field. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: 17pages, 6figures. IEEE Transactions on Systems, Man, and Cybernetics: Systems. arXiv admin note: text overlap with arXiv:2303.09824

arXiv:2305.11239 [pdf, other]

doi 10.1109/TSMC.2023.3276218

Milestones in Autonomous Driving and Intelligent Vehicles Part I: Control, Computing System Design, Communication, HD Map, Testing, and Human Behaviors

Authors: Long Chen, Yuchen Li, Chao Huang, Yang Xing, Daxin Tian, Li Li, Zhongxu Hu, Siyu Teng, Chen Lv, **jun Wang, Dongpu Cao, Nanning Zheng, Fei-Yue Wang

Abstract: Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks and lack systematic summaries and research directions in the future. Our work is divided into 3 independent articles and the first… ▽ More Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks and lack systematic summaries and research directions in the future. Our work is divided into 3 independent articles and the first part is a Survey of Surveys (SoS) for total technologies of AD and IVs that involves the history, summarizes the milestones, and provides the perspectives, ethics, and future research directions. This is the second part (Part I for this technical survey) to review the development of control, computing system design, communication, High Definition map (HD map), testing, and human behaviors in IVs. In addition, the third part (Part II for this technical survey) is to review the perception and planning sections. The objective of this paper is to involve all the sections of AD, summarize the latest technical milestones, and guide abecedarians to quickly understand the development of AD and IVs. Combining the SoS and Part II, we anticipate that this work will bring novel and diverse insights to researchers and abecedarians, and serve as a bridge between past and future. △ Less

Submitted 26 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: 18 pages, 4 figures, 3 tables, in IEEE Trans. Syst. Man Cybern. Syst

arXiv:2304.14419 [pdf, other]

doi 10.1145/3592107

Unsupervised Learning of Robust Spectral Shape Matching

Authors: Dongliang Cao, Paul Roetzer, Florian Bernard

Abstract: We propose a novel learning-based approach for robust 3D shape matching. Our method builds upon deep functional maps and can be trained in a fully unsupervised manner. Previous deep functional map methods mainly focus on predicting optimised functional maps alone, and then rely on off-the-shelf post-processing to obtain accurate point-wise maps during inference. However, this two-stage procedure f… ▽ More We propose a novel learning-based approach for robust 3D shape matching. Our method builds upon deep functional maps and can be trained in a fully unsupervised manner. Previous deep functional map methods mainly focus on predicting optimised functional maps alone, and then rely on off-the-shelf post-processing to obtain accurate point-wise maps during inference. However, this two-stage procedure for obtaining point-wise maps often yields sub-optimal performance. In contrast, building upon recent insights about the relation between functional maps and point-wise maps, we propose a novel unsupervised loss to couple the functional maps and point-wise maps, and thereby directly obtain point-wise maps without any post-processing. Our approach obtains accurate correspondences not only for near-isometric shapes, but also for more challenging non-isometric shapes and partial shapes, as well as shapes with different discretisation or topological noise. Using a total of nine diverse datasets, we extensively evaluate the performance and demonstrate that our method substantially outperforms previous state-of-the-art methods, even compared to recent supervised methods. Our code is available at https://github.com/dongliangcao/Unsupervised-Learning-of-Robust-Spectral-Shape-Matching. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Journal ref: ACM Transactions on Graphics 2023

arXiv:2304.09603 [pdf, other]

doi 10.1007/978-3-031-34674-3

Visualising Personal Data Flows: Insights from a Case Study of Booking.com

Authors: Haiyue Yuan, Matthew Boakes, Xiao Ma, Dongmei Cao, Shujun Li

Abstract: Commercial organisations are holding and processing an ever-increasing amount of personal data. Policies and laws are continually changing to require these companies to be more transparent regarding the collection, storage, processing and sharing of this data. This paper reports our work of taking Booking.com as a case study to visualise personal data flows extracted from their privacy policy. By… ▽ More Commercial organisations are holding and processing an ever-increasing amount of personal data. Policies and laws are continually changing to require these companies to be more transparent regarding the collection, storage, processing and sharing of this data. This paper reports our work of taking Booking.com as a case study to visualise personal data flows extracted from their privacy policy. By showcasing how the company shares its consumers' personal data, we raise questions and extend discussions on the challenges and limitations of using privacy policies to inform online users about the true scale and the landscape of personal data flows. This case study can inform us about future research on more data flow-oriented privacy policy analysis and on the construction of a more comprehensive ontology on personal data flows in complicated business ecosystems. △ Less

Submitted 17 July, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: This is the full edition of a paper published in Intelligent Information Systems: CAiSE Forum 2023, Zaragoza, Spain, June 12-16, 2023, Proceedings, Lecture Notes in Business Information Processing (LNBIP), Volume 477, pp. 52-60, 2023, Springer Nature, https://link.springer.com/book/10.1007/978-3-031-34674-3

Journal ref: Lecture Notes in Business Information Processing (LNBIP), 2023

arXiv:2304.07633 [pdf, other]

Interpretable Detection of Out-of-Context Misinformation with Neural-Symbolic-Enhanced Large Multimodal Model

Authors: Yizhou Zhang, Loc Trinh, Defu Cao, Zijun Cui, Yan Liu

Abstract: Recent years have witnessed the sustained evolution of misinformation that aims at manipulating public opinions. Unlike traditional rumors or fake news editors who mainly rely on generated and/or counterfeited images, text and videos, current misinformation creators now more tend to use out-of-context multimedia contents (e.g. mismatched images and captions) to deceive the public and fake news det… ▽ More Recent years have witnessed the sustained evolution of misinformation that aims at manipulating public opinions. Unlike traditional rumors or fake news editors who mainly rely on generated and/or counterfeited images, text and videos, current misinformation creators now more tend to use out-of-context multimedia contents (e.g. mismatched images and captions) to deceive the public and fake news detection systems. This new type of misinformation increases the difficulty of not only detection but also clarification, because every individual modality is close enough to true information. To address this challenge, in this paper we explore how to achieve interpretable cross-modal de-contextualization detection that simultaneously identifies the mismatched pairs and the cross-modal contradictions, which is helpful for fact-check websites to document clarifications. The proposed model first symbolically disassembles the text-modality information to a set of fact queries based on the Abstract Meaning Representation of the caption and then forwards the query-image pairs into a pre-trained large vision-language model select the ``evidences" that are helpful for us to detect misinformation. Extensive experiments indicate that the proposed methodology can provide us with much more interpretable predictions while maintaining the accuracy same as the state-of-the-art model on this task. △ Less

Submitted 5 April, 2024; v1 submitted 15 April, 2023; originally announced April 2023.

Comments: 9 Pages, 3 Figures

arXiv:2304.04332 [pdf, other]

Better Together: Unifying Datalog and Equality Saturation

Authors: Yihong Zhang, Yisu Remy Wang, Oliver Flatt, David Cao, Philip Zucker, Eli Rosenthal, Zachary Tatlock, Max Willsey

Abstract: We present egglog, a fixpoint reasoning system that unifies Datalog and equality saturation (EqSat). Like Datalog, it supports efficient incremental execution, cooperating analyses, and lattice-based reasoning. Like EqSat, it supports term rewriting, efficient congruence closure, and extraction of optimized terms. We identify two recent applications--a unification-based pointer analysis in Datal… ▽ More We present egglog, a fixpoint reasoning system that unifies Datalog and equality saturation (EqSat). Like Datalog, it supports efficient incremental execution, cooperating analyses, and lattice-based reasoning. Like EqSat, it supports term rewriting, efficient congruence closure, and extraction of optimized terms. We identify two recent applications--a unification-based pointer analysis in Datalog and an EqSat-based floating-point term rewriter--that have been hampered by features missing from Datalog but found in EqSat or vice-versa. We evaluate egglog by reimplementing those projects in egglog. The resulting systems in egglog are faster, simpler, and fix bugs found in the original systems. △ Less

Submitted 15 May, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

Comments: PLDI 2023

arXiv:2304.00592 [pdf, other]

PK-Chat: Pointer Network Guided Knowledge Driven Generative Dialogue Model

Authors: Cheng Deng, Bo Tong, Luoyi Fu, Jiaxin Ding, Dexing Cao, Xinbing Wang, Chenghu Zhou

Abstract: In the research of end-to-end dialogue systems, using real-world knowledge to generate natural, fluent, and human-like utterances with correct answers is crucial. However, domain-specific conversational dialogue systems may be incoherent and introduce erroneous external information to answer questions due to the out-of-vocabulary issue or the wrong knowledge from the parameters of the neural netwo… ▽ More In the research of end-to-end dialogue systems, using real-world knowledge to generate natural, fluent, and human-like utterances with correct answers is crucial. However, domain-specific conversational dialogue systems may be incoherent and introduce erroneous external information to answer questions due to the out-of-vocabulary issue or the wrong knowledge from the parameters of the neural network. In this work, we propose PK-Chat, a Pointer network guided Knowledge-driven generative dialogue model, incorporating a unified pretrained language model and a pointer network over knowledge graphs. The words generated by PK-Chat in the dialogue are derived from the prediction of word lists and the direct prediction of the external knowledge graph knowledge. Moreover, based on the PK-Chat, a dialogue system is built for academic scenarios in the case of geosciences. Finally, an academic dialogue benchmark is constructed to evaluate the quality of dialogue systems in academic scenarios and the source code is available online. △ Less

Submitted 2 April, 2023; originally announced April 2023.

ACM Class: I.2.7; F.4.1

arXiv:2303.17220 [pdf, other]

doi 10.1109/TIV.2022.3223131

Milestones in Autonomous Driving and Intelligent Vehicles: Survey of Surveys

Authors: Long Chen, Yuchen Li, Chao Huang, Bai Li, Yang Xing, Daxin Tian, Li Li, Zhongxu Hu, Xiaoxiang Na, Zixuan Li, Siyu Teng, Chen Lv, **jun Wang, Dongpu Cao, Nanning Zheng, Fei-Yue Wang

Abstract: Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks, lack of systematic summary and research directions in the future. Here we propose a Survey of Surveys (SoS) for total technologie… ▽ More Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks, lack of systematic summary and research directions in the future. Here we propose a Survey of Surveys (SoS) for total technologies of AD and IVs that reviews the history, summarizes the milestones, and provides the perspectives, ethics, and future research directions. To our knowledge, this article is the first SoS with milestones in AD and IVs, which constitutes our complete research work together with two other technical surveys. We anticipate that this article will bring novel and diverse insights to researchers and abecedarians, and serve as a bridge between past and future. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: 13 pages, 3 tables, 0 figure

Journal ref: IEEE Transactions on Intelligent Vehicles, vol. 8, no. 2, pp. 1046-1056, Feb. 2023

arXiv:2303.10971 [pdf, other]

Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching

Authors: Dongliang Cao, Florian Bernard

Abstract: The matching of 3D shapes has been extensively studied for shapes represented as surface meshes, as well as for shapes represented as point clouds. While point clouds are a common representation of raw real-world 3D data (e.g. from laser scanners), meshes encode rich and expressive topological information, but their creation typically requires some form of (often manual) curation. In turn, methods… ▽ More The matching of 3D shapes has been extensively studied for shapes represented as surface meshes, as well as for shapes represented as point clouds. While point clouds are a common representation of raw real-world 3D data (e.g. from laser scanners), meshes encode rich and expressive topological information, but their creation typically requires some form of (often manual) curation. In turn, methods that purely rely on point clouds are unable to meet the matching quality of mesh-based methods that utilise the additional topological structure. In this work we close this gap by introducing a self-supervised multimodal learning strategy that combines mesh-based functional map regularisation with a contrastive loss that couples mesh and point cloud data. Our shape matching approach allows to obtain intramodal correspondences for triangle meshes, complete point clouds, and partially observed point clouds, as well as correspondences across these data modalities. We demonstrate that our method achieves state-of-the-art results on several challenging benchmark datasets even in comparison to recent supervised methods, and that our method reaches previously unseen cross-dataset generalisation ability. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: accepted by CVPR 2023

arXiv:2303.02320 [pdf, other]

Estimating Treatment Effects from Irregular Time Series Observations with Hidden Confounders

Authors: Defu Cao, James Enouen, Yu**g Wang, Xiangchen Song, Chuizheng Meng, Hao Niu, Yan Liu

Abstract: Causal analysis for time series data, in particular estimating individualized treatment effect (ITE), is a key task in many real-world applications, such as finance, retail, healthcare, etc. Real-world time series can include large-scale, irregular, and intermittent time series observations, raising significant challenges to existing work attempting to estimate treatment effects. Specifically, the… ▽ More Causal analysis for time series data, in particular estimating individualized treatment effect (ITE), is a key task in many real-world applications, such as finance, retail, healthcare, etc. Real-world time series can include large-scale, irregular, and intermittent time series observations, raising significant challenges to existing work attempting to estimate treatment effects. Specifically, the existence of hidden confounders can lead to biased treatment estimates and complicate the causal inference process. In particular, anomaly hidden confounders which exceed the typical range can lead to high variance estimates. Moreover, in continuous time settings with irregular samples, it is challenging to directly handle the dynamics of causality. In this paper, we leverage recent advances in Lipschitz regularization and neural controlled differential equations (CDE) to develop an effective and scalable solution, namely LipCDE, to address the above challenges. LipCDE can directly model the dynamic causal relationships between historical data and outcomes with irregular samples by considering the boundary of hidden confounders given by Lipschitz-constrained neural networks. Furthermore, we conduct extensive experiments on both synthetic and real-world datasets to demonstrate the effectiveness and scalability of LipCDE. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: Accepted by AAAI 2023

arXiv:2303.02304 [pdf, other]

Coupled Multiwavelet Neural Operator Learning for Coupled Partial Differential Equations

Authors: Xiongye Xiao, Defu Cao, Ruochen Yang, Gaurav Gupta, Gengshuo Liu, Chenzhong Yin, Radu Balan, Paul Bogdan

Abstract: Coupled partial differential equations (PDEs) are key tasks in modeling the complex dynamics of many physical processes. Recently, neural operators have shown the ability to solve PDEs by learning the integral kernel directly in Fourier/Wavelet space, so the difficulty for solving the coupled PDEs depends on dealing with the coupled map**s between the functions. Towards this end, we propose a \t… ▽ More Coupled partial differential equations (PDEs) are key tasks in modeling the complex dynamics of many physical processes. Recently, neural operators have shown the ability to solve PDEs by learning the integral kernel directly in Fourier/Wavelet space, so the difficulty for solving the coupled PDEs depends on dealing with the coupled map**s between the functions. Towards this end, we propose a \textit{coupled multiwavelets neural operator} (CMWNO) learning scheme by decoupling the coupled integral kernels during the multiwavelet decomposition and reconstruction procedures in the Wavelet space. The proposed model achieves significantly higher accuracy compared to previous learning-based solvers in solving the coupled PDEs including Gray-Scott (GS) equations and the non-local mean field game (MFG) problem. According to our experimental results, the proposed model exhibits a $2\times \sim 4\times$ improvement relative $L$2 error compared to the best results from the state-of-the-art models. △ Less

Submitted 8 December, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: Accepted to ICLR 2023

arXiv:2302.09446 [pdf, other]

Estimating Treatment Effects in Continuous Time with Hidden Confounders

Authors: Defu Cao, James Enouen, Yan Liu

Abstract: Estimating treatment effects plays a crucial role in causal inference, having many real-world applications like policy analysis and decision making. Nevertheless, estimating treatment effects in the longitudinal setting in the presence of hidden confounders remains an extremely challenging problem. Recently, there is a growing body of work attempting to obtain unbiased ITE estimates from time-dyna… ▽ More Estimating treatment effects plays a crucial role in causal inference, having many real-world applications like policy analysis and decision making. Nevertheless, estimating treatment effects in the longitudinal setting in the presence of hidden confounders remains an extremely challenging problem. Recently, there is a growing body of work attempting to obtain unbiased ITE estimates from time-dynamic observational data by ignoring the possible existence of hidden confounders. Additionally, many existing works handling hidden confounders are not applicable for continuous-time settings. In this paper, we extend the line of work focusing on deconfounding in the dynamic time setting in the presence of hidden confounders. We leverage recent advancements in neural differential equations to build a latent factor model using a stochastic controlled differential equation and Lipschitz constrained convolutional operation in order to continuously incorporate information about ongoing interventions and irregularly sampled observations. Experiments on both synthetic and real-world datasets highlight the promise of continuous time methods for estimating treatment effects in the presence of hidden confounders. △ Less

Submitted 20 February, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

Comments: 7 pages. First presentation was at ICML 2022 workshop Continuous time methods for machine learning

arXiv:2302.07622 [pdf]

Embodied Footprints: A Safety-guaranteed Collision Avoidance Model for Numerical Optimization-based Trajectory Planning

Authors: Bai Li, Youmin Zhang, Tantan Zhang, Tankut Acarman, Yakun Ouyang, Li Li, Hairong Dong, Dongpu Cao

Abstract: Optimization-based methods are commonly applied in autonomous driving trajectory planners, which transform the continuous-time trajectory planning problem into a finite nonlinear program with constraints imposed at finite collocation points. However, potential violations between adjacent collocation points can occur. To address this issue thoroughly, we propose a safety-guaranteed collision-avoida… ▽ More Optimization-based methods are commonly applied in autonomous driving trajectory planners, which transform the continuous-time trajectory planning problem into a finite nonlinear program with constraints imposed at finite collocation points. However, potential violations between adjacent collocation points can occur. To address this issue thoroughly, we propose a safety-guaranteed collision-avoidance model to mitigate collision risks within optimization-based trajectory planners. This model introduces an embodied footprint, an enlarged representation of the vehicle's nominal footprint. If the embodied footprints do not collide with obstacles at finite collocation points, then the ego vehicle's nominal footprint is guaranteed to be collision-free at any of the infinite moments between adjacent collocation points. According to our theoretical analysis, we define the geometric size of an embodied footprint as a simple function of vehicle velocity and curvature. Particularly, we propose a trajectory optimizer with the embodied footprints that can theoretically set an appropriate number of collocation points prior to the optimization process. We conduct this research to enhance the foundation of optimization-based planners in robotics. Comparative simulations and field tests validate the completeness, solution speed, and solution quality of our proposal. △ Less

Submitted 14 September, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

Comments: 15 pages, 16 figures

arXiv:2212.04596 [pdf, other]

doi 10.1145/3571207

babble: Learning Better Abstractions with E-Graphs and Anti-Unification

Authors: David Cao, Rose Kunkel, Chandrakana Nandi, Max Willsey, Zachary Tatlock, Nadia Polikarpova

Abstract: Library learning compresses a given corpus of programs by extracting common structure from the corpus into reusable library functions. Prior work on library learning suffers from two limitations that prevent it from scaling to larger, more complex inputs. First, it explores too many candidate library functions that are not useful for compression. Second, it is not robust to syntactic variation in… ▽ More Library learning compresses a given corpus of programs by extracting common structure from the corpus into reusable library functions. Prior work on library learning suffers from two limitations that prevent it from scaling to larger, more complex inputs. First, it explores too many candidate library functions that are not useful for compression. Second, it is not robust to syntactic variation in the input. We propose library learning modulo theory (LLMT), a new library learning algorithm that additionally takes as input an equational theory for a given problem domain. LLMT uses e-graphs and equality saturation to compactly represent the space of programs equivalent modulo the theory, and uses a novel e-graph anti-unification technique to find common patterns in the corpus more directly and efficiently. We implemented LLMT in a tool named BABBLE. Our evaluation shows that BABBLE achieves better compression orders of magnitude faster than the state of the art. We also provide a qualitative evaluation showing that BABBLE learns reusable functions on inputs previously out of reach for library learning. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Comments: POPL 2023

arXiv:2211.11513 [pdf, other]

DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting Algorithms under Distributional Shift

Authors: Defu Cao, Yousef El-Laham, Loc Trinh, Svitlana Vyetrenko, Yan Liu

Abstract: In electronic trading markets, limit order books (LOBs) provide information about pending buy/sell orders at various price levels for a given security. Recently, there has been a growing interest in using LOB data for resolving downstream machine learning tasks (e.g., forecasting). However, dealing with out-of-distribution (OOD) LOB data is challenging since distributional shifts are unlabeled in… ▽ More In electronic trading markets, limit order books (LOBs) provide information about pending buy/sell orders at various price levels for a given security. Recently, there has been a growing interest in using LOB data for resolving downstream machine learning tasks (e.g., forecasting). However, dealing with out-of-distribution (OOD) LOB data is challenging since distributional shifts are unlabeled in current publicly available LOB datasets. Therefore, it is critical to build a synthetic LOB dataset with labeled OOD samples serving as a testbed for develo** models that generalize well to unseen scenarios. In this work, we utilize a multi-agent market simulator to build a synthetic LOB dataset, named DSLOB, with and without market stress scenarios, which allows for the design of controlled distributional shift benchmarking. Using the proposed synthetic dataset, we provide a holistic analysis on the forecasting performance of three different state-of-the-art forecasting methods. Our results reflect the need for increased researcher efforts to develop algorithms with robustness to distributional shifts in high-frequency time series data. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 11 pages, 5 figures, already accepted by NeurIPS 2022 Distribution Shifts Workshop

arXiv:2210.07518 [pdf, other]

Counterfactual Neural Temporal Point Process for Estimating Causal Influence of Misinformation on Social Media

Authors: Yizhou Zhang, Defu Cao, Yan Liu

Abstract: Recent years have witnessed the rise of misinformation campaigns that spread specific narratives on social media to manipulate public opinions on different areas, such as politics and healthcare. Consequently, an effective and efficient automatic methodology to estimate the influence of the misinformation on user beliefs and activities is needed. However, existing works on misinformation impact es… ▽ More Recent years have witnessed the rise of misinformation campaigns that spread specific narratives on social media to manipulate public opinions on different areas, such as politics and healthcare. Consequently, an effective and efficient automatic methodology to estimate the influence of the misinformation on user beliefs and activities is needed. However, existing works on misinformation impact estimation either rely on small-scale psychological experiments or can only discover the correlation between user behaviour and misinformation. To address these issues, in this paper, we build up a causal framework that model the causal effect of misinformation from the perspective of temporal point process. To adapt the large-scale data, we design an efficient yet precise way to estimate the Individual Treatment Effect(ITE) via neural temporal point process and gaussian mixture models. Extensive experiments on synthetic dataset verify the effectiveness and efficiency of our model. We further apply our model on a real-world dataset of social media posts and engagements about COVID-19 vaccines. The experimental results indicate that our model recognized identifiable causal effect of misinformation that hurts people's subjective emotions toward the vaccines. △ Less

Submitted 14 October, 2022; originally announced October 2022.

Comments: 19 pages, 8 figures, already accepted by NeurIPS 2022

arXiv:2210.06006 [pdf, other]

BEV-LaneDet: a Simple and Effective 3D Lane Detection Baseline

Authors: Ruihao Wang, Jian Qin, Kaiying Li, Yaochen Li, Dong Cao, **tao Xu

Abstract: 3D lane detection which plays a crucial role in vehicle routing, has recently been a rapidly develo** topic in autonomous driving. Previous works struggle with practicality due to their complicated spatial transformations and inflexible representations of 3D lanes. Faced with the issues, our work proposes an efficient and robust monocular 3D lane detection called BEV-LaneDet with three main cont… ▽ More 3D lane detection which plays a crucial role in vehicle routing, has recently been a rapidly develo** topic in autonomous driving. Previous works struggle with practicality due to their complicated spatial transformations and inflexible representations of 3D lanes. Faced with the issues, our work proposes an efficient and robust monocular 3D lane detection called BEV-LaneDet with three main contributions. First, we introduce the Virtual Camera that unifies the in/extrinsic parameters of cameras mounted on different vehicles to guarantee the consistency of the spatial relationship among cameras. It can effectively promote the learning procedure due to the unified visual space. We secondly propose a simple but efficient 3D lane representation called Key-Points Representation. This module is more suitable to represent the complicated and diverse 3D lane structures. At last, we present a light-weight and chip-friendly spatial transformation module named Spatial Transformation Pyramid to transform multiscale front-view features into BEV features. Experimental results demonstrate that our work outperforms the state-of-the-art approaches in terms of F-Score, being 10.6% higher on the OpenLane dataset and 5.9% higher on the Apollo 3D synthetic dataset, with a speed of 185 FPS. The source code will released at https://github.com/gigo-team/bev_lane_det. △ Less

Submitted 11 March, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted by CVPR2023

arXiv:2210.05084 [pdf, other]

Covert Communication Gains from Adversary's Uncertainty of Phase Angles

Authors: Sen Qiao, Daming Cao, Qiaosheng Zhang, Yinfei Xu, Guangjie Liu

Abstract: This work investigates the phase gain of intelligent reflecting surface (IRS) covert communication over complex-valued additive white Gaussian noise (AWGN) channels. The transmitter Alice intends to transmit covert messages to the legitimate receiver Bob via reflecting the broadcast signals from a radio frequency (RF) source, while rendering the adversary Willie's detector arbitrarily close to ine… ▽ More This work investigates the phase gain of intelligent reflecting surface (IRS) covert communication over complex-valued additive white Gaussian noise (AWGN) channels. The transmitter Alice intends to transmit covert messages to the legitimate receiver Bob via reflecting the broadcast signals from a radio frequency (RF) source, while rendering the adversary Willie's detector arbitrarily close to ineffective. Our analyses show that, compared to the covert capacity for classical AWGN channels, we can achieve a covertness gain of value 2 by leveraging Willie's uncertainty of phase angles. This covertness gain is achieved when the number of possible phase angle pairs $N=2$. More interestingly, our results show that the covertness gain will not further increase with $N$ as long as $N \ge 2$, even if it approaches infinity. △ Less

Submitted 6 May, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

arXiv:2207.09610 [pdf, other]

Unsupervised Deep Multi-Shape Matching

Authors: Dongliang Cao, Florian Bernard

Abstract: 3D shape matching is a long-standing problem in computer vision and computer graphics. While deep neural networks were shown to lead to state-of-the-art results in shape matching, existing learning-based approaches are limited in the context of multi-shape matching: (i) either they focus on matching pairs of shapes only and thus suffer from cycle-inconsistent multi-matchings, or (ii) they require… ▽ More 3D shape matching is a long-standing problem in computer vision and computer graphics. While deep neural networks were shown to lead to state-of-the-art results in shape matching, existing learning-based approaches are limited in the context of multi-shape matching: (i) either they focus on matching pairs of shapes only and thus suffer from cycle-inconsistent multi-matchings, or (ii) they require an explicit template shape to address the matching of a collection of shapes. In this paper, we present a novel approach for deep multi-shape matching that ensures cycle-consistent multi-matchings while not depending on an explicit template shape. To this end, we utilise a shape-to-universe multi-matching representation that we combine with powerful functional map regularisation, so that our multi-shape matching neural network can be trained in a fully unsupervised manner. While the functional map regularisation is only considered during training time, functional maps are not computed for predicting correspondences, thereby allowing for fast inference. We demonstrate that our method achieves state-of-the-art results on several challenging benchmark datasets, and, most remarkably, that our unsupervised method even outperforms recent supervised methods. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: to be published in ECCV2022

arXiv:2207.07896 [pdf, other]

Cross Vision-RF Gait Re-identification with Low-cost RGB-D Cameras and mmWave Radars

Authors: Dongjiang Cao, Ruofeng Liu, Hao Li, Shuai Wang, Wenchao Jiang, Chris Xiaoxuan Lu

Abstract: Human identification is a key requirement for many applications in everyday life, such as personalized services, automatic surveillance, continuous authentication, and contact tracing during pandemics, etc. This work studies the problem of cross-modal human re-identification (ReID), in response to the regular human movements across camera-allowed regions (e.g., streets) and camera-restricted regio… ▽ More Human identification is a key requirement for many applications in everyday life, such as personalized services, automatic surveillance, continuous authentication, and contact tracing during pandemics, etc. This work studies the problem of cross-modal human re-identification (ReID), in response to the regular human movements across camera-allowed regions (e.g., streets) and camera-restricted regions (e.g., offices) deployed with heterogeneous sensors. By leveraging the emerging low-cost RGB-D cameras and mmWave radars, we propose the first-of-its-kind vision-RF system for cross-modal multi-person ReID at the same time. Firstly, to address the fundamental inter-modality discrepancy, we propose a novel signature synthesis algorithm based on the observed specular reflection model of a human body. Secondly, an effective cross-modal deep metric learning model is introduced to deal with interference caused by unsynchronized data across radars and cameras. Through extensive experiments in both indoor and outdoor environments, we demonstrate that our proposed system is able to achieve ~92.5% top-1 accuracy and ~97.5% top-5 accuracy out of 56 volunteers. We also show that our proposed system is able to robustly reidentify subjects even when multiple subjects are present in the sensors' field of view. △ Less

Submitted 16 July, 2022; originally announced July 2022.

Comments: 24 pages, 20 figures, accepted to IMWUT

arXiv:2206.07257 [pdf, other]

doi 10.1093/mnras/stac1693

Investigation of stellar magnetic activity using variational autoencoder based on low-resolution spectroscopic survey

Authors: Yue Xiang, Shenghong Gu, Dongtao Cao

Abstract: We apply the variational autoencoder (VAE) to the LAMOST-K2 low-resolution spectra to detect the magnetic activity of the stars in the K2 field. After the training on the spectra of the selected inactive stars, the VAE model can efficiently generate the synthetic reference templates needed by the spectral subtraction procedure, without knowing any stellar parameters. Then we detect the peculiar sp… ▽ More We apply the variational autoencoder (VAE) to the LAMOST-K2 low-resolution spectra to detect the magnetic activity of the stars in the K2 field. After the training on the spectra of the selected inactive stars, the VAE model can efficiently generate the synthetic reference templates needed by the spectral subtraction procedure, without knowing any stellar parameters. Then we detect the peculiar spectral features, such as chromospheric emissions, strong nebular emissions and lithium absorptions, in our sample. We measure the emissions of the chromospheric activity indicators, H$α$ and Ca II infrared triplet (IRT) lines, to quantify the stellar magnetic activity. The excess emissions of H$α$ and Ca II IRT lines of the active stars are correlated well to the rotational periods and the amplitudes of light curves derived from the K2 photometry. We degrade the LAMOST spectra to simulate the slitless spectra of the China Space Station Telescope (CSST) and apply the VAE to the simulated data. For cool active stars, we reveal a good agreement between the equivalent widths (EWs) of H$α$ line derived from the spectra with two resolutions. The result indicates the ability of identifying the magnetically active stars in the future CSST survey, which will deliver an unprecedented large database of low-resolution spectra as well as simultaneous multi-band photometry of stars. △ Less

Submitted 6 July, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: 13 pages, 19 figures, accepted for publication in MNRAS. Table 1 is available on Zenodo at https://doi.org/10.5281/zenodo.6802956 and the code can be found on GitHub at https://github.com/xylib/vae-for-spectroscopic-survey

arXiv:2203.16797 [pdf, ps, other]

When Physics Meets Machine Learning: A Survey of Physics-Informed Machine Learning

Authors: Chuizheng Meng, Sungyong Seo, Defu Cao, Sam Griesemer, Yan Liu

Abstract: Physics-informed machine learning (PIML), referring to the combination of prior knowledge of physics, which is the high level abstraction of natural phenomenons and human behaviours in the long history, with data-driven machine learning models, has emerged as an effective way to mitigate the shortage of training data, to increase models' generalizability and to ensure the physical plausibility of… ▽ More Physics-informed machine learning (PIML), referring to the combination of prior knowledge of physics, which is the high level abstraction of natural phenomenons and human behaviours in the long history, with data-driven machine learning models, has emerged as an effective way to mitigate the shortage of training data, to increase models' generalizability and to ensure the physical plausibility of results. In this paper, we survey an abundant number of recent works in PIML and summarize them from three aspects: (1) motivations of PIML, (2) physics knowledge in PIML, (3) methods of physics knowledge integration in PIML. We also discuss current challenges and corresponding research opportunities in PIML. △ Less

Submitted 31 March, 2022; originally announced March 2022.

arXiv:2203.16697 [pdf, other]

doi 10.1145/3519939.3523450

Type-Directed Program Synthesis for RESTful APIs

Authors: Zheng Guo, David Cao, Davin Tjong, Jean Yang, Cole Schlesinger, Nadia Polikarpova

Abstract: With the rise of software-as-a-service and microservice architectures, RESTful APIs are now ubiquitous in mobile and web applications. A service can have tens or hundreds of API methods, making it a challenge for programmers to find the right combination of methods to solve their task. We present APIphany, a component-based synthesizer for programs that compose calls to RESTful APIs. The main in… ▽ More With the rise of software-as-a-service and microservice architectures, RESTful APIs are now ubiquitous in mobile and web applications. A service can have tens or hundreds of API methods, making it a challenge for programmers to find the right combination of methods to solve their task. We present APIphany, a component-based synthesizer for programs that compose calls to RESTful APIs. The main innovation behind APIphany is the use of precise semantic types, both to specify user intent and to direct the search. APIphany contributes three novel mechanisms to overcome challenges in adapting component-based synthesis to the REST domain: (1) a type inference algorithm for augmenting REST specifications with semantic types; (2) an efficient synthesis technique for "wrangling" semi-structured data, which is commonly required in working with RESTful APIs; and (3) a new form of simulated execution to avoid executing APIs calls during synthesis. We evaluate APIphany on three real-world APIs and 32 tasks extracted from GitHub repositories and StackOverflow. In our experiments, APIphany found correct solutions to 29 tasks, with 23 of them reported among top ten synthesis results. △ Less

Submitted 5 April, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

arXiv:2203.14064 [pdf, other]

doi 10.1109/TMC.2023.3239339

BARGAIN-MATCH: A Game Theoretical Approach for Resource Allocation and Task Offloading in Vehicular Edge Computing Networks

Authors: Zemin Sun, Geng Sun, Yanheng Liu, Jian Wang, Dongpu Cao

Abstract: Vehicular edge computing (VEC) is emerging as a promising architecture of vehicular networks (VNs) by deploying the cloud computing resources at the edge of the VNs. This work aims to optimize resource allocation and task offloading in VEC networks. Specifically, we formulate a game theoretical resource allocation and task offloading problem (GTRATOP) that aims to maximize the system performance b… ▽ More Vehicular edge computing (VEC) is emerging as a promising architecture of vehicular networks (VNs) by deploying the cloud computing resources at the edge of the VNs. This work aims to optimize resource allocation and task offloading in VEC networks. Specifically, we formulate a game theoretical resource allocation and task offloading problem (GTRATOP) that aims to maximize the system performance by jointly considering the incentive for cooperation, competition among vehicles, heterogeneity between VEC servers and vehicles, and inherent dynamic of VNs. Since the formulated GTRATOP is NP-hard, we propose an adaptive approach for resource allocation and task offloading in VEC networks by incorporating bargaining game and matching game, which is called BARGAIN-MATCH. First, for resource allocation, a bargaining game-based incentive is proposed to stimulate the vehicles and VEC servers to negotiate the optimal resource allocation and pricing decisions. Second, for task offloading, a many-to-one matching scheme is proposed to decide the optimal offloading strategies. Third, the dynamic and time-varying features are considered to adapt the strategies of BARGAIN-MATCH to the real-time VEC networks. Moreover, the BARGAIN-MATCH is proved to be stable and weak Pareto optimal. Simulation results demonstrate that the proposed BARGAIN-MATCH achieves superior system performance and efficiency compared to other methods, especially when the system workload is heavy. △ Less

Submitted 23 December, 2023; v1 submitted 26 March, 2022; originally announced March 2022.

Showing 1–50 of 103 results for author: Cao, D