Search | arXiv e-print repository

Advancements in Feature Extraction Recognition of Medical Imaging Systems Through Deep Learning Technique

Authors: Qishi Zhan, Dan Sun, Erdi Gao, Yuhan Ma, Yaxin Liang, Haowei Yang

Abstract: This study introduces a novel unsupervised medical image feature extraction method that employs spatial stratification techniques. An objective function based on weight is proposed to achieve the purpose of fast image recognition. The algorithm divides the pixels of the image into multiple subdomains and uses a quadtree to access the image. A technique for threshold optimization utilizing a simple… ▽ More This study introduces a novel unsupervised medical image feature extraction method that employs spatial stratification techniques. An objective function based on weight is proposed to achieve the purpose of fast image recognition. The algorithm divides the pixels of the image into multiple subdomains and uses a quadtree to access the image. A technique for threshold optimization utilizing a simplex algorithm is presented. Aiming at the nonlinear characteristics of hyperspectral images, a generalized discriminant analysis algorithm based on kernel function is proposed. In this project, a hyperspectral remote sensing image is taken as the object, and we investigate its mathematical modeling, solution methods, and feature extraction techniques. It is found that different types of objects are independent of each other and compact in image processing. Compared with the traditional linear discrimination method, the result of image segmentation is better. This method can not only overcome the disadvantage of the traditional method which is easy to be affected by light, but also extract the features of the object quickly and accurately. It has important reference significance for clinical diagnosis. △ Less

Submitted 23 May, 2024; originally announced June 2024.

Comments: conference

arXiv:2406.08838 [pdf]

Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning

Authors: Dan Sun, Yaxin Liang, Yining Yang, Yuhan Ma, Qishi Zhan, Erdi Gao

Abstract: This project intends to study the image representation based on attention mechanism and multimodal data. By adding multiple pattern layers to the attribute model, the semantic and hidden layers of image content are integrated. The word vector is quantified by the Word2Vec method and then evaluated by a word embedding convolutional neural network. The published experimental results of the two group… ▽ More This project intends to study the image representation based on attention mechanism and multimodal data. By adding multiple pattern layers to the attribute model, the semantic and hidden layers of image content are integrated. The word vector is quantified by the Word2Vec method and then evaluated by a word embedding convolutional neural network. The published experimental results of the two groups were tested. The experimental results show that this method can convert discrete features into continuous characters, thus reducing the complexity of feature preprocessing. Word2Vec and natural language processing technology are integrated to achieve the goal of direct evaluation of missing image features. The robustness of the image feature evaluation model is improved by using the excellent feature analysis characteristics of a convolutional neural network. This project intends to improve the existing image feature identification methods and eliminate the subjective influence in the evaluation process. The findings from the simulation indicate that the novel approach has developed is viable, effectively augmenting the features within the produced representations. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2403.13430 [pdf, other]

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

Authors: Di Wang, **g Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, Haonan Guo, Bo Du, Dacheng Tao, Liangpei Zhang

Abstract: Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. Pretraining is an active research topic, encompassing supervised and self-supervised learning methods to initialize model weights effectively. However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as i… ▽ More Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. Pretraining is an active research topic, encompassing supervised and self-supervised learning methods to initialize model weights effectively. However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. In this study, we explore the Multi-Task Pretraining (MTP) paradigm for RS foundation models to address this issue. Using a shared encoder and task-specific decoder architecture, we conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. MTP supports both convolutional neural networks and vision transformer foundation models with over 300 million parameters. The pretrained models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection. Extensive experiments across 14 datasets demonstrate the superiority of our models over existing ones of similar size and their competitive performance compared to larger state-of-the-art models, thus validating the effectiveness of MTP. △ Less

Submitted 29 May, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted by IEEE JSTARS Special issue on "Large-Scale Pretraining for Interpretation Promotion in Remote Sensing Domain". The codes and pretrained models are available at https://github.com/ViTAE-Transformer/MTP

arXiv:2401.08537 [pdf]

Spatial Entity Resolution between Restaurant Locations and Transportation Destinations in Southeast Asia

Authors: Emily Gao, Dominic Widdows

Abstract: As a tech company, Grab has expanded from transportation to food delivery, aiming to serve Southeast Asia with hyperlocalized applications. Information about places as transportation destinations can help to improve our knowledge about places as restaurants, so long as the spatial entity resolution problem between these datasets can be solved. In this project, we attempted to recognize identical p… ▽ More As a tech company, Grab has expanded from transportation to food delivery, aiming to serve Southeast Asia with hyperlocalized applications. Information about places as transportation destinations can help to improve our knowledge about places as restaurants, so long as the spatial entity resolution problem between these datasets can be solved. In this project, we attempted to recognize identical place entities from databases of Points-of-Interest (POI) and GrabFood restaurants, using their spatial and textual attributes, i.e., latitude, longitude, place name, and street address. Distance metrics were calculated for these attributes and fed to tree-based classifiers. POI-restaurant matching was conducted separately for Singapore, Philippines, Indonesia, and Malaysia. Experimental estimates demonstrate that a matching POI can be found for over 35% of restaurants in these countries. As part of these estimates, test datasets were manually created, and RandomForest, AdaBoost, Gradient Boosting, and XGBoost perform well, with most accuracy, precision, and recall scores close to or higher than 90% for matched vs. unmatched classification. To the authors' knowledge, there are no previous published scientific papers devoted to matching of spatial entities for the Southeast Asia region. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Journal ref: 6th International Conference on Geospatial Information Systems Theory, Applications, and Management. GISTAM 2020, Prague, Czech Republic, May 7-9, 2020

arXiv:2401.04739 [pdf]

Content-Conditioned Generation of Stylized Free hand Sketches

Authors: Jiajun Liu, Siyuan Wang, Guangming Zhu, Liang Zhang, Ning Li, Eryang Gao

Abstract: In recent years, the recognition of free-hand sketches has remained a popular task. However, in some special fields such as the military field, free-hand sketches are difficult to sample on a large scale. Common data augmentation and image generation techniques are difficult to produce images with various free-hand sketching styles. Therefore, the recognition and segmentation tasks in related fiel… ▽ More In recent years, the recognition of free-hand sketches has remained a popular task. However, in some special fields such as the military field, free-hand sketches are difficult to sample on a large scale. Common data augmentation and image generation techniques are difficult to produce images with various free-hand sketching styles. Therefore, the recognition and segmentation tasks in related fields are limited. In this paper, we propose a novel adversarial generative network that can accurately generate realistic free-hand sketches with various styles. We explore the performance of the model, including using styles randomly sampled from a prior normal distribution to generate images with various free-hand sketching styles, disentangling the painters' styles from known free-hand sketches to generate images with specific styles, and generating images of unknown classes that are not in the training set. We further demonstrate with qualitative and quantitative evaluations our advantages in visual quality, content accuracy, and style imitation on SketchIME. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 6 pages, 7 figures, ICSMD

arXiv:2401.03828 [pdf]

A multimodal gesture recognition dataset for desktop human-computer interaction

Authors: Qi Wang, Fengchao Zhu, Guangming Zhu, Liang Zhang, Ning Li, Eryang Gao

Abstract: Gesture recognition is an indispensable component of natural and efficient human-computer interaction technology, particularly in desktop-level applications, where it can significantly enhance people's productivity. However, the current gesture recognition community lacks a suitable desktop-level (top-view perspective) dataset for lightweight gesture capture devices. In this study, we have establi… ▽ More Gesture recognition is an indispensable component of natural and efficient human-computer interaction technology, particularly in desktop-level applications, where it can significantly enhance people's productivity. However, the current gesture recognition community lacks a suitable desktop-level (top-view perspective) dataset for lightweight gesture capture devices. In this study, we have established a dataset named GR4DHCI. What distinguishes this dataset is its inherent naturalness, intuitive characteristics, and diversity. Its primary purpose is to serve as a valuable resource for the development of desktop-level portable applications. GR4DHCI comprises over 7,000 gesture samples and a total of 382,447 frames for both Stereo IR and skeletal modalities. We also address the variances in hand positioning during desktop interactions by incorporating 27 different hand positions into the dataset. Building upon the GR4DHCI dataset, we conducted a series of experimental studies, the results of which demonstrate that the fine-grained classification blocks proposed in this paper can enhance the model's recognition accuracy. Our dataset and experimental findings presented in this paper are anticipated to propel advancements in desktop-level gesture recognition research. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.07999 [pdf, other]

Random Serial Dictatorship with Transfers

Authors: Sudharsan Sundar, Eric Gao, Trevor Chow, Matthew Ding

Abstract: It is well known that Random Serial Dictatorship is strategy-proof and leads to a Pareto-Efficient outcome. We show that this result breaks down when individuals are allowed to make transfers, and adapt Random Serial Dictatorship to encompass trades between individuals. Strategic analysis of play under the new mechanisms we define is given, accompanied by simulations to quantify the gains from tra… ▽ More It is well known that Random Serial Dictatorship is strategy-proof and leads to a Pareto-Efficient outcome. We show that this result breaks down when individuals are allowed to make transfers, and adapt Random Serial Dictatorship to encompass trades between individuals. Strategic analysis of play under the new mechanisms we define is given, accompanied by simulations to quantify the gains from trade. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2302.02394 [pdf, other]

Eliminating Contextual Prior Bias for Semantic Image Editing via Dual-Cycle Diffusion

Authors: Zuopeng Yang, Tianshu Chu, Xin Lin, Erdun Gao, Daqing Liu, Jie Yang, Chaoyue Wang

Abstract: The recent success of text-to-image generation diffusion models has also revolutionized semantic image editing, enabling the manipulation of images based on query/target texts. Despite these advancements, a significant challenge lies in the potential introduction of contextual prior bias in pre-trained models during image editing, e.g., making unexpected modifications to inappropriate regions. To… ▽ More The recent success of text-to-image generation diffusion models has also revolutionized semantic image editing, enabling the manipulation of images based on query/target texts. Despite these advancements, a significant challenge lies in the potential introduction of contextual prior bias in pre-trained models during image editing, e.g., making unexpected modifications to inappropriate regions. To address this issue, we present a novel approach called Dual-Cycle Diffusion, which generates an unbiased mask to guide image editing. The proposed model incorporates a Bias Elimination Cycle that consists of both a forward path and an inverted path, each featuring a Structural Consistency Cycle to ensure the preservation of image content during the editing process. The forward path utilizes the pre-trained model to produce the edited image, while the inverted path converts the result back to the source image. The unbiased mask is generated by comparing differences between the processed source image and the edited image to ensure that both conform to the same distribution. Our experiments demonstrate the effectiveness of the proposed method, as it significantly improves the D-CLIP score from 0.272 to 0.283. The code will be available at https://github.com/JohnDreamer/DualCycleDiffsion. △ Less

Submitted 5 October, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

Comments: This paper has been accepted by the IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

arXiv:2209.02946 [pdf, other]

On the Sparse DAG Structure Learning Based on Adaptive Lasso

Authors: Danru Xu, Erdun Gao, Wei Huang, Menghan Wang, Andy Song, Mingming Gong

Abstract: Learning the underlying Bayesian Networks (BNs), represented by directed acyclic graphs (DAGs), of the concerned events from purely-observational data is a crucial part of evidential reasoning. This task remains challenging due to the large and discrete search space. A recent flurry of developments followed NOTEARS[1] recast this combinatorial problem into a continuous optimization problem by leve… ▽ More Learning the underlying Bayesian Networks (BNs), represented by directed acyclic graphs (DAGs), of the concerned events from purely-observational data is a crucial part of evidential reasoning. This task remains challenging due to the large and discrete search space. A recent flurry of developments followed NOTEARS[1] recast this combinatorial problem into a continuous optimization problem by leveraging an algebraic equality characterization of acyclicity. However, the continuous optimization methods suffer from obtaining non-spare graphs after the numerical optimization, which leads to the inflexibility to rule out the potentially cycle-inducing edges or false discovery edges with small values. To address this issue, in this paper, we develop a completely data-driven DAG structure learning method without a predefined value to post-threshold small values. We name our method NOTEARS with adaptive Lasso (NOTEARS-AL), which is achieved by applying the adaptive penalty method to ensure the sparsity of the estimated DAG. Moreover, we show that NOTEARS-AL also inherits the oracle properties under some specific conditions. Extensive experiments on both synthetic and a real-world dataset demonstrate that our method consistently outperforms NOTEARS. △ Less

Submitted 17 February, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

Comments: 11 pages, 8 figures

arXiv:2205.13869 [pdf, other]

MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models

Authors: Erdun Gao, Ignavier Ng, Mingming Gong, Li Shen, Wei Huang, Tongliang Liu, Kun Zhang, Howard Bondell

Abstract: State-of-the-art causal discovery methods usually assume that the observational data is complete. However, the missing data problem is pervasive in many practical scenarios such as clinical trials, economics, and biology. One straightforward way to address the missing data problem is first to impute the data using off-the-shelf imputation methods and then apply existing causal discovery methods. H… ▽ More State-of-the-art causal discovery methods usually assume that the observational data is complete. However, the missing data problem is pervasive in many practical scenarios such as clinical trials, economics, and biology. One straightforward way to address the missing data problem is first to impute the data using off-the-shelf imputation methods and then apply existing causal discovery methods. However, such a two-step method may suffer from suboptimality, as the imputation algorithm may introduce bias for modeling the underlying data distribution. In this paper, we develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations. Focusing mainly on the assumptions of ignorable missingness and the identifiable additive noise models (ANMs), MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization (EM) framework. In the E-step, in cases where computing the posterior distributions of parameters in closed-form is not feasible, Monte Carlo EM is leveraged to approximate the likelihood. In the M-step, MissDAG leverages the density transformation to model the noise distributions with simpler and specific formulations by virtue of the ANMs and uses a likelihood-based causal discovery algorithm with directed acyclic graph constraint. We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments. △ Less

Submitted 16 January, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

Comments: Accepted to NeurIPS22

arXiv:2112.03555 [pdf, other]

FedDAG: Federated DAG Structure Learning

Authors: Erdun Gao, Junjia Chen, Li Shen, Tongliang Liu, Mingming Gong, Howard Bondell

Abstract: To date, most directed acyclic graphs (DAGs) structure learning approaches require data to be stored in a central server. However, due to the consideration of privacy protection, data owners gradually refuse to share their personalized raw data to avoid private information leakage, making this task more troublesome by cutting off the first step. Thus, a puzzle arises: \textit{how do we discover th… ▽ More To date, most directed acyclic graphs (DAGs) structure learning approaches require data to be stored in a central server. However, due to the consideration of privacy protection, data owners gradually refuse to share their personalized raw data to avoid private information leakage, making this task more troublesome by cutting off the first step. Thus, a puzzle arises: \textit{how do we discover the underlying DAG structure from decentralized data?} In this paper, focusing on the additive noise models (ANMs) assumption of data generation, we take the first step in develo** a gradient-based learning framework named FedDAG, which can learn the DAG structure without directly touching the local data and also can naturally handle the data heterogeneity. Our method benefits from a two-level structure of each local model. The first level structure learns the edges and directions of the graph and communicates with the server to get the model information from other clients during the learning procedure, while the second level structure approximates the mechanisms among variables and personally updates on its own data to accommodate the data heterogeneity. Moreover, FedDAG formulates the overall learning task as a continuous optimization problem by taking advantage of an equality acyclicity constraint, which can be solved by gradient descent methods to boost the searching efficiency. Extensive experiments on both synthetic and real-world datasets verify the efficacy of the proposed method. △ Less

Submitted 16 January, 2023; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: Accepted to Transactions on Machine Learning Research

arXiv:2107.03227 [pdf, other]

Scalable Data Balancing for Unlabeled Satellite Imagery

Authors: Deep Patel, Erin Gao, Anirudh Koul, Siddha Ganju, Meher Anand Kasam

Abstract: Data imbalance is a ubiquitous problem in machine learning. In large scale collected and annotated datasets, data imbalance is either mitigated manually by undersampling frequent classes and oversampling rare classes, or planned for with imputation and augmentation techniques. In both cases balancing data requires labels. In other words, only annotated data can be balanced. Collecting fully annota… ▽ More Data imbalance is a ubiquitous problem in machine learning. In large scale collected and annotated datasets, data imbalance is either mitigated manually by undersampling frequent classes and oversampling rare classes, or planned for with imputation and augmentation techniques. In both cases balancing data requires labels. In other words, only annotated data can be balanced. Collecting fully annotated datasets is challenging, especially for large scale satellite systems such as the unlabeled NASA's 35 PB Earth Imagery dataset. Although the NASA Earth Imagery dataset is unlabeled, there are implicit properties of the data source that we can rely on to hypothesize about its imbalance, such as distribution of land and water in the case of the Earth's imagery. We present a new iterative method to balance unlabeled data. Our method utilizes image embeddings as a proxy for image labels that can be used to balance data, and ultimately when trained increases overall accuracy. △ Less

Submitted 7 July, 2021; originally announced July 2021.

Comments: Accepted to COSPAR 2021 Workshop on Machine Learning for Space Sciences. 5 pages, 9 figures

arXiv:2102.03629 [pdf]

EEG-based Investigation of the Impact of Classroom Design on Cognitive Performance of Students

Authors: Jesus G. Cruz-Garza, Michael Darfler, James D. Rounds, Elita Gao, Saleh Kalantari

Abstract: This study investigated the neural dynamics associated with short-term exposure to different virtual classroom designs with different window placement and room dimension. Participants engaged in five brief cognitive tasks in each design condition including the Stroop Test, the Digit Span Test, the Benton Test, a Visual Memory Test, and an Arithmetic Test. Performance on the cognitive tests and Ele… ▽ More This study investigated the neural dynamics associated with short-term exposure to different virtual classroom designs with different window placement and room dimension. Participants engaged in five brief cognitive tasks in each design condition including the Stroop Test, the Digit Span Test, the Benton Test, a Visual Memory Test, and an Arithmetic Test. Performance on the cognitive tests and Electroencephalogram (EEG) data were analyzed by contrasting various classroom design conditions. The cognitive-test-performance results showed no significant differences related to the architectural design features studied. We computed frequency band-power and connectivity EEG features to identify neural patterns associated to environmental conditions. A leave one out machine learning classification scheme was implemented to assess the robustness of the EEG features, with the classification accuracy evaluation of the trained model repeatedly performed against an unseen participant's data. The classification results located consistent differences in the EEG features across participants in the different classroom design conditions, with a predictive power that was significantly higher compared to a baseline classification learning outcome using scrambled data. These findings were most robust during the Visual Memory Test, and were not found during the Stroop Test and the Arithmetic Test. The most discriminative EEG features were observed in bilateral occipital, parietal, and frontal regions in the theta and alpha frequency bands. While the implications of these findings for student learning are yet to be determined, this study provides rigorous evidence that brain activity features during cognitive tasks are affected by the design elements of window placement and room dimensions. △ Less

Submitted 6 February, 2021; originally announced February 2021.

arXiv:1908.05965 [pdf]

Adaptive Embedding Pattern for Grayscale-Invariance Reversible Data Hiding

Authors: Erdun Gao, Zhibin Pan, Xinyi Gao

Abstract: In traditional reversible data hiding (RDH) methods, researchers pay attention to enlarge the embedding capacity (EC) and to reduce the embedding distortion (ED). Recently, a completely novel RDH algorithm was developed to embed secret data into color image without changing the corresponding grayscale [1], which largely expands the applications of RDH. In [1], for color image, channel R and channe… ▽ More In traditional reversible data hiding (RDH) methods, researchers pay attention to enlarge the embedding capacity (EC) and to reduce the embedding distortion (ED). Recently, a completely novel RDH algorithm was developed to embed secret data into color image without changing the corresponding grayscale [1], which largely expands the applications of RDH. In [1], for color image, channel R and channel B are exploited to carry secret information, channel G is adjusted for balancing the modifications of channel R and channel B to keep the invariance of grayscale. However, we found that the embedding performance (EP) of that method is still unsatisfied and could be further enhanced. To improve the EP, an adaptive embedding pattern is introduced to enhance the competence of algorithm for selectively embedding different bits of secret data into pixels according to context information. Moreover, a novel two-level predictor is designed by uniting two normal predictors for reducing the ED for embedding more bits. Experimental results demonstrate that, compared to the previous method, our scheme could significantly enhance the image fidelity while kee** the grayscale invariant. △ Less

Submitted 16 August, 2019; originally announced August 2019.

Showing 1–14 of 14 results for author: Gao, E