Search | arXiv e-print repository

Path-Specific Causal Reasoning for Fairness-aware Cognitive Diagnosis

Authors: Dacao Zhang, Kun Zhang, Le Wu, Mi Tian, Richang Hong, Meng Wang

Abstract: Cognitive Diagnosis~(CD), which leverages students and exercise data to predict students' proficiency levels on different knowledge concepts, is one of fundamental components in Intelligent Education. Due to the scarcity of student-exercise interaction data, most existing methods focus on making the best use of available data, such as exercise content and student information~(e.g., educational con… ▽ More Cognitive Diagnosis~(CD), which leverages students and exercise data to predict students' proficiency levels on different knowledge concepts, is one of fundamental components in Intelligent Education. Due to the scarcity of student-exercise interaction data, most existing methods focus on making the best use of available data, such as exercise content and student information~(e.g., educational context). Despite the great progress, the abuse of student sensitive information has not been paid enough attention. Due to the important position of CD in Intelligent Education, employing sensitive information when making diagnosis predictions will cause serious social issues. Moreover, data-driven neural networks are easily misled by the shortcut between input data and output prediction, exacerbating this problem. Therefore, it is crucial to eliminate the negative impact of sensitive information in CD models. In response, we argue that sensitive attributes of students can also provide useful information, and only the shortcuts directly related to the sensitive information should be eliminated from the diagnosis process. Thus, we employ causal reasoning and design a novel Path-Specific Causal Reasoning Framework (PSCRF) to achieve this goal. Specifically, we first leverage an encoder to extract features and generate embeddings for general information and sensitive information of students. Then, we design a novel attribute-oriented predictor to decouple the sensitive attributes, in which fairness-related sensitive features will be eliminated and other useful information will be retained. Finally, we designed a multi-factor constraint to ensure the performance of fairness and diagnosis performance simultaneously. Extensive experiments over real-world datasets (e.g., PISA dataset) demonstrate the effectiveness of our proposed PSCRF. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accpeted by KDD'2024

arXiv:2404.10595 [pdf, other]

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

Authors: Kai Chen, Yanze Li, Wenhua Zhang, Yanxin Liu, Pengxiang Li, Ruiyuan Gao, Lanqing Hong, Meng Tian, Xinhai Zhao, Zhenguo Li, Dit-Yan Yeung, Huchuan Lu, Xu Jia

Abstract: Large Vision-Language Models (LVLMs) have received widespread attention in advancing the interpretable self-driving. Existing evaluations of LVLMs primarily focus on the multi-faceted capabilities in natural circumstances, lacking automated and quantifiable assessment for self-driving, let alone the severe road corner cases. In this paper, we propose CODA-LM, the very first benchmark for the autom… ▽ More Large Vision-Language Models (LVLMs) have received widespread attention in advancing the interpretable self-driving. Existing evaluations of LVLMs primarily focus on the multi-faceted capabilities in natural circumstances, lacking automated and quantifiable assessment for self-driving, let alone the severe road corner cases. In this paper, we propose CODA-LM, the very first benchmark for the automatic evaluation of LVLMs for self-driving corner cases. We adopt a hierarchical data structure to prompt powerful LVLMs to analyze complex driving scenes and generate high-quality pre-annotation for human annotators, and for LVLM evaluation, we show that using the text-only large language models (LLMs) as judges reveals even better alignment with human preferences than the LVLM judges. Moreover, with CODA-LM, we build CODA-VLM, a new driving LVLM surpassing all the open-sourced counterparts on CODA-LM. Our CODA-VLM performs comparably with GPT-4V, even surpassing GPT-4V by +21.42% on the regional perception task. We hope CODA-LM can become the catalyst to promote interpretable self-driving empowered by LVLMs. △ Less

Submitted 26 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: Project Page: https://coda-dataset.github.io/coda-lm/

arXiv:2404.10515 [pdf, other]

An Enhanced Differential Grou** Method for Large-Scale Overlap** Problems

Authors: Maojiang Tian, Mingke Chen, Wei Du, Yang Tang, Yaochu **

Abstract: Large-scale overlap** problems are prevalent in practical engineering applications, and the optimization challenge is significantly amplified due to the existence of shared variables. Decomposition-based cooperative coevolution (CC) algorithms have demonstrated promising performance in addressing large-scale overlap** problems. However, current CC frameworks designed for overlap** problems r… ▽ More Large-scale overlap** problems are prevalent in practical engineering applications, and the optimization challenge is significantly amplified due to the existence of shared variables. Decomposition-based cooperative coevolution (CC) algorithms have demonstrated promising performance in addressing large-scale overlap** problems. However, current CC frameworks designed for overlap** problems rely on grou** methods for the identification of overlap** problem structures and the current grou** methods for large-scale overlap** problems fail to consider both accuracy and efficiency simultaneously. In this article, we propose a two-stage enhanced grou** method for large-scale overlap** problems, called OEDG, which achieves accurate grou** while significantly reducing computational resource consumption. In the first stage, OEDG employs a grou** method based on the finite differences principle to identify all subcomponents and shared variables. In the second stage, we propose two grou** refinement methods, called subcomponent union detection (SUD) and subcomponent detection (SD), to enhance and refine the grou** results. SUD examines the information of the subcomponents and shared variables obtained in the previous stage, and SD corrects inaccurate grou** results. To better verify the performance of the proposed OEDG, we propose a series of novel benchmarks that consider various properties of large-scale overlap** problems, including the topology structure, overlap** degree, and separability. Extensive experimental results demonstrate that OEDG is capable of accurately grou** different types of large-scale overlap** problems while consuming fewer computational resources. Finally, we empirically verify that the proposed OEDG can effectively improve the optimization performance of diverse large-scale overlap** problems. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2403.01192 [pdf, other]

A Composite Decomposition Method for Large-Scale Global Optimization

Authors: Maojiang Tian, Minyang Chen, Wei Du, Yang Tang, Yaochu **, Gary G. Yen

Abstract: Cooperative co-evolution (CC) algorithms, based on the divide-and-conquer strategy, have emerged as the predominant approach to solving large-scale global optimization (LSGO) problems. The efficiency and accuracy of the grou** stage significantly impact the performance of the optimization process. While the general separability grou** (GSG) method has overcome the limitation of previous differ… ▽ More Cooperative co-evolution (CC) algorithms, based on the divide-and-conquer strategy, have emerged as the predominant approach to solving large-scale global optimization (LSGO) problems. The efficiency and accuracy of the grou** stage significantly impact the performance of the optimization process. While the general separability grou** (GSG) method has overcome the limitation of previous differential grou** (DG) methods by enabling the decomposition of non-additively separable functions, it suffers from high computational complexity. To address this challenge, this article proposes a composite separability grou** (CSG) method, seamlessly integrating DG and GSG into a problem decomposition framework to utilize the strengths of both approaches. CSG introduces a step-by-step decomposition framework that accurately decomposes various problem types using fewer computational resources. By sequentially identifying additively, multiplicatively and generally separable variables, CSG progressively groups non-separable variables by recursively considering the interactions between each non-separable variable and the formed non-separable groups. Furthermore, to enhance the efficiency and accuracy of CSG, we introduce two innovative methods: a multiplicatively separable variable detection method and a non-separable variable grou** method. These two methods are designed to effectively detect multiplicatively separable variables and efficiently group non-separable variables, respectively. Extensive experimental results demonstrate that CSG achieves more accurate variable grou** with lower computational complexity compared to GSG and state-of-the-art DG series designs. △ Less

Submitted 8 March, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.00261 [pdf, other]

Spatial Cascaded Clustering and Weighted Memory for Unsupervised Person Re-identification

Authors: Jiahao Hong, Jialong Zuo, Chuchu Han, Ruochen Zheng, Ming Tian, Changxin Gao, Nong Sang

Abstract: Recent unsupervised person re-identification (re-ID) methods achieve high performance by leveraging fine-grained local context. These methods are referred to as part-based methods. However, most part-based methods obtain local contexts through horizontal division, which suffer from misalignment due to various human poses. Additionally, the misalignment of semantic information in part features rest… ▽ More Recent unsupervised person re-identification (re-ID) methods achieve high performance by leveraging fine-grained local context. These methods are referred to as part-based methods. However, most part-based methods obtain local contexts through horizontal division, which suffer from misalignment due to various human poses. Additionally, the misalignment of semantic information in part features restricts the use of metric learning, thus affecting the effectiveness of part-based methods. The two issues mentioned above result in the under-utilization of part features in part-based methods. We introduce the Spatial Cascaded Clustering and Weighted Memory (SCWM) method to address these challenges. SCWM aims to parse and align more accurate local contexts for different human body parts while allowing the memory module to balance hard example mining and noise suppression. Specifically, we first analyze the foreground omissions and spatial confusions issues in the previous method. Then, we propose foreground and space corrections to enhance the completeness and reasonableness of the human parsing results. Next, we introduce a weighted memory and utilize two weighting strategies. These strategies address hard sample mining for global features and enhance noise resistance for part features, which enables better utilization of both global and part features. Extensive experiments on Market-1501 and MSMT17 validate the proposed method's effectiveness over many state-of-the-art methods. △ Less

Submitted 29 February, 2024; originally announced March 2024.

arXiv:2402.19173 [pdf, other]

StarCoder 2 and The Stack v2: The Next Generation

Authors: Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo , et al. (41 additional authors not shown)

Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data… ▽ More The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.17194 [pdf]

The Random Forest Model for Analyzing and Forecasting the US Stock Market in the Context of Smart Finance

Authors: Jiajian Zheng, Duan Xin, Qishuo Cheng, Miao Tian, Le Yang

Abstract: The stock market is a crucial component of the financial market, playing a vital role in wealth accumulation for investors, financing costs for listed companies, and the stable development of the national macroeconomy. Significant fluctuations in the stock market can damage the interests of stock investors and cause an imbalance in the industrial structure, which can interfere with the macro level… ▽ More The stock market is a crucial component of the financial market, playing a vital role in wealth accumulation for investors, financing costs for listed companies, and the stable development of the national macroeconomy. Significant fluctuations in the stock market can damage the interests of stock investors and cause an imbalance in the industrial structure, which can interfere with the macro level development of the national economy. The prediction of stock price trends is a popular research topic in academia. Predicting the three trends of stock pricesrising, sideways, and falling can assist investors in making informed decisions about buying, holding, or selling stocks. Establishing an effective forecasting model for predicting these trends is of substantial practical importance. This paper evaluates the predictive performance of random forest models combined with artificial intelligence on a test set of four stocks using optimal parameters. The evaluation considers both predictive accuracy and time efficiency. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 10 pages, 8 figures

arXiv:2402.17191 [pdf]

AI-Driven Anonymization: Protecting Personal Data Privacy While Leveraging Machine Learning

Authors: Le Yang, Miao Tian, Duan Xin, Qishuo Cheng, Jiajian Zheng

Abstract: The development of artificial intelligence has significantly transformed people's lives. However, it has also posed a significant threat to privacy and security, with numerous instances of personal information being exposed online and reports of criminal attacks and theft. Consequently, the need to achieve intelligent protection of personal information through machine learning algorithms has becom… ▽ More The development of artificial intelligence has significantly transformed people's lives. However, it has also posed a significant threat to privacy and security, with numerous instances of personal information being exposed online and reports of criminal attacks and theft. Consequently, the need to achieve intelligent protection of personal information through machine learning algorithms has become a paramount concern. Artificial intelligence leverages advanced algorithms and technologies to effectively encrypt and anonymize personal data, enabling valuable data analysis and utilization while safeguarding privacy. This paper focuses on personal data privacy protection and the promotion of anonymity as its core research objectives. It achieves personal data privacy protection and detection through the use of machine learning's differential privacy protection algorithm. The paper also addresses existing challenges in machine learning related to privacy and personal data protection, offers improvement suggestions, and analyzes factors impacting datasets to enable timely personal data privacy detection and protection. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 9 pages, 6 figures

arXiv:2402.15994 [pdf]

Optimizing Portfolio Management and Risk Assessment in Digital Assets Using Deep Learning for Predictive Analysis

Authors: Qishuo Cheng, Le Yang, Jiajian Zheng, Miao Tian, Duan Xin

Abstract: Portfolio management issues have been extensively studied in the field of artificial intelligence in recent years, but existing deep learning-based quantitative trading methods have some areas where they could be improved. First of all, the prediction mode of stocks is singular; often, only one trading expert is trained by a model, and the trading decision is solely based on the prediction results… ▽ More Portfolio management issues have been extensively studied in the field of artificial intelligence in recent years, but existing deep learning-based quantitative trading methods have some areas where they could be improved. First of all, the prediction mode of stocks is singular; often, only one trading expert is trained by a model, and the trading decision is solely based on the prediction results of the model. Secondly, the data source used by the model is relatively simple, and only considers the data of the stock itself, ignoring the impact of the whole market risk on the stock. In this paper, the DQN algorithm is introduced into asset management portfolios in a novel and straightforward way, and the performance greatly exceeds the benchmark, which fully proves the effectiveness of the DRL algorithm in portfolio management. This also inspires us to consider the complexity of financial problems, and the use of algorithms should be fully combined with the problems to adapt. Finally, in this paper, the strategy is implemented by selecting the assets and actions with the largest Q value. Since different assets are trained separately as environments, there may be a phenomenon of Q value drift among different assets (different assets have different Q value distribution areas), which may easily lead to incorrect asset selection. Consider adding constraints so that the Q values of different assets share a Q value distribution to improve results. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: 10 pages, 5 figures

arXiv:2312.17263 [pdf, other]

TACIT: A Target-Agnostic Feature Disentanglement Framework for Cross-Domain Text Classification

Authors: Rui Song, Fausto Giunchiglia, Yingji Li, Mingjie Tian, Hao Xu

Abstract: Cross-domain text classification aims to transfer models from label-rich source domains to label-poor target domains, giving it a wide range of practical applications. Many approaches promote cross-domain generalization by capturing domain-invariant features. However, these methods rely on unlabeled samples provided by the target domains, which renders the model ineffective when the target domain… ▽ More Cross-domain text classification aims to transfer models from label-rich source domains to label-poor target domains, giving it a wide range of practical applications. Many approaches promote cross-domain generalization by capturing domain-invariant features. However, these methods rely on unlabeled samples provided by the target domains, which renders the model ineffective when the target domain is agnostic. Furthermore, the models are easily disturbed by shortcut learning in the source domain, which also hinders the improvement of domain generalization ability. To solve the aforementioned issues, this paper proposes TACIT, a target domain agnostic feature disentanglement framework which adaptively decouples robust and unrobust features by Variational Auto-Encoders. Additionally, to encourage the separation of unrobust features from robust features, we design a feature distillation task that compels unrobust features to approximate the output of the teacher. The teacher model is trained with a few easy samples that are easy to carry potential unknown shortcuts. Experimental results verify that our framework achieves comparable results to state-of-the-art baselines while utilizing only source domain data. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI-2024

arXiv:2312.06614 [pdf, other]

AttenScribble: Attentive Similarity Learning for Scribble-Supervised Medical Image Segmentation

Authors: Mu Tian, Qinzhu Yang, Yi Gao

Abstract: The success of deep networks in medical image segmentation relies heavily on massive labeled training data. However, acquiring dense annotations is a time-consuming process. Weakly-supervised methods normally employ less expensive forms of supervision, among which scribbles started to gain popularity lately thanks to its flexibility. However, due to lack of shape and boundary information, it is ex… ▽ More The success of deep networks in medical image segmentation relies heavily on massive labeled training data. However, acquiring dense annotations is a time-consuming process. Weakly-supervised methods normally employ less expensive forms of supervision, among which scribbles started to gain popularity lately thanks to its flexibility. However, due to lack of shape and boundary information, it is extremely challenging to train a deep network on scribbles that generalizes on unlabeled pixels. In this paper, we present a straightforward yet effective scribble supervised learning framework. Inspired by recent advances of transformer based segmentation, we create a pluggable spatial self-attention module which could be attached on top of any internal feature layers of arbitrary fully convolutional network (FCN) backbone. The module infuses global interaction while kee** the efficiency of convolutions. Descended from this module, we construct a similarity metric based on normalized and symmetrized attention. This attentive similarity leads to a novel regularization loss that imposes consistency between segmentation prediction and visual affinity. This attentive similarity loss optimizes the alignment of FCN encoders, attention map** and model prediction. Ultimately, the proposed FCN+Attention architecture can be trained end-to-end guided by a combination of three learning objectives: partial segmentation loss, a customized masked conditional random fields and the proposed attentive similarity loss. Extensive experiments on public datasets (ACDC and CHAOS) showed that our framework not just out-performs existing state-of-the-art, but also delivers close performance to fully-supervised benchmark. Code will be available upon publication. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 11 pages, 3 figures, a modified version was submitted to Computerized Medical Imaging and Graphics and is under review

arXiv:2312.06072 [pdf, other]

A dynamic interactive learning framework for automated 3D medical image segmentation

Authors: Mu Tian, Xiaohui Chen, Yi Gao

Abstract: Many deep learning based automated medical image segmentation systems, in reality, face difficulties in deployment due to the cost of massive data annotation and high latency in model iteration. We propose a dynamic interactive learning framework that addresses these challenges by integrating interactive segmentation into end-to-end weak supervised learning with streaming tasks. We develop novel r… ▽ More Many deep learning based automated medical image segmentation systems, in reality, face difficulties in deployment due to the cost of massive data annotation and high latency in model iteration. We propose a dynamic interactive learning framework that addresses these challenges by integrating interactive segmentation into end-to-end weak supervised learning with streaming tasks. We develop novel replay and label smoothing schemes that overcome catastrophic forgetting and improve online learning robustness. For each image, our multi-round interactive segmentation module simultaneously optimizes both front-end predictions and deep learning segmenter. In each round, a 3D "proxy mask" is propagated from sparse user inputs based on image registration, serving as weak supervision that enable knowledge distillation from the unknown ground truth. In return, the trained segmenter explicitly guides next step's user interventions according to a spatial residual map from consecutive front or back-end predictions. Evaluation on 3D segmentation tasks (NCI-ISBI2013 and BraTS2015) shows that our framework generates online learning performances that match offline training benchmark. In addition, with a 62% reduction in total annotation efforts, our framework produces competitive dice scores comparing to online and offline learning which equipped with full ground truth. Furthermore, such a framework, with its flexibility and responsiveness, could be deployed behind hospital firewall that guarantees data security and easy maintenance. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 24 pages, 8 figures, under review

arXiv:2311.17088 [pdf, other]

Unsupervised Multimodal Deepfake Detection Using Intra- and Cross-Modal Inconsistencies

Authors: Mulin Tian, Mahyar Khayatkhoei, Joe Mathai, Wael AbdAlmageed

Abstract: Deepfake videos present an increasing threat to society with potentially negative impact on criminal justice, democracy, and personal safety and privacy. Meanwhile, detecting deepfakes, at scale, remains a very challenging task that often requires labeled training data from existing deepfake generation methods. Further, even the most accurate supervised deepfake detection methods do not generalize… ▽ More Deepfake videos present an increasing threat to society with potentially negative impact on criminal justice, democracy, and personal safety and privacy. Meanwhile, detecting deepfakes, at scale, remains a very challenging task that often requires labeled training data from existing deepfake generation methods. Further, even the most accurate supervised deepfake detection methods do not generalize to deepfakes generated using new generation methods. In this paper, we propose a novel unsupervised method for detecting deepfake videos by directly identifying intra-modal and cross-modal inconsistency between video segments. The fundamental hypothesis behind the proposed detection method is that motion or identity inconsistencies are inevitable in deepfake videos. We will mathematically and empirically support this hypothesis, and then proceed to constructing our method grounded in our theoretical analysis. Our proposed method outperforms prior state-of-the-art unsupervised deepfake detection methods on the challenging FakeAVCeleb dataset, and also has several additional advantages: it is scalable because it does not require pristine (real) samples for each identity during inference and therefore can apply to arbitrarily many identities, generalizable because it is trained only on real videos and therefore does not rely on a particular deepfake method, reliable because it does not rely on any likelihood estimation in high dimensions, and explainable because it can pinpoint the exact location of modality inconsistencies which are then verifiable by a human expert. △ Less

Submitted 20 June, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: 11 pages, 3 figures, 3 tables

arXiv:2311.13015 [pdf, other]

Fast and Interpretable Mortality Risk Scores for Critical Care Patients

Authors: Chloe Qinyu Zhu, Muhang Tian, Lesia Semenova, Jiachang Liu, Jack Xu, Joseph Scarpa, Cynthia Rudin

Abstract: Prediction of mortality in intensive care unit (ICU) patients is an important task in critical care medicine. Prior work in creating mortality risk models falls into two major categories: domain-expert-created scoring systems, and black box machine learning (ML) models. Both of these have disadvantages: black box models are unacceptable for use in hospitals, whereas manual creation of models (incl… ▽ More Prediction of mortality in intensive care unit (ICU) patients is an important task in critical care medicine. Prior work in creating mortality risk models falls into two major categories: domain-expert-created scoring systems, and black box machine learning (ML) models. Both of these have disadvantages: black box models are unacceptable for use in hospitals, whereas manual creation of models (including hand-tuning of logistic regression parameters) relies on humans to perform high-dimensional constrained optimization, which leads to a loss in performance. In this work, we bridge the gap between accurate black box models and hand-tuned interpretable models. We build on modern interpretable ML techniques to design accurate and interpretable mortality risk scores. We leverage the largest existing public ICU monitoring datasets, namely the MIMIC III and eICU datasets. By evaluating risk across medical centers, we are able to study generalization across domains. In order to customize our risk score models, we develop a new algorithm, GroupFasterRisk, which has several important benefits: (1) it uses hard sparsity constraint, allowing users to directly control the number of features; (2) it incorporates group sparsity to allow more cohesive models; (3) it allows for monotonicity correction on models for including domain knowledge; (4) it produces many equally-good models at once, which allows domain experts to choose among them. GroupFasterRisk creates its risk scores within hours, even on the large datasets we study here. GroupFasterRisk's risk scores perform better than risk scores currently used in hospitals, and have similar prediction performance to black box ML models (despite being much sparser). Because GroupFasterRisk produces a variety of risk scores and handles constraints, it allows design flexibility, which is the key enabler of practical and trustworthy model creation. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2310.17190 [pdf, other]

Lookup Table meets Local Laplacian Filter: Pyramid Reconstruction Network for Tone Map**

Authors: Feng Zhang, Ming Tian, Zhiqiang Li, Bin Xu, Qingbo Lu, Changxin Gao, Nong Sang

Abstract: Tone map** aims to convert high dynamic range (HDR) images to low dynamic range (LDR) representations, a critical task in the camera imaging pipeline. In recent years, 3-Dimensional LookUp Table (3D LUT) based methods have gained attention due to their ability to strike a favorable balance between enhancement performance and computational efficiency. However, these methods often fail to deliver… ▽ More Tone map** aims to convert high dynamic range (HDR) images to low dynamic range (LDR) representations, a critical task in the camera imaging pipeline. In recent years, 3-Dimensional LookUp Table (3D LUT) based methods have gained attention due to their ability to strike a favorable balance between enhancement performance and computational efficiency. However, these methods often fail to deliver satisfactory results in local areas since the look-up table is a global operator for tone map**, which works based on pixel values and fails to incorporate crucial local information. To this end, this paper aims to address this issue by exploring a novel strategy that integrates global and local operators by utilizing closed-form Laplacian pyramid decomposition and reconstruction. Specifically, we employ image-adaptive 3D LUTs to manipulate the tone in the low-frequency image by leveraging the specific characteristics of the frequency information. Furthermore, we utilize local Laplacian filters to refine the edge details in the high-frequency components in an adaptive manner. Local Laplacian filters are widely used to preserve edge details in photographs, but their conventional usage involves manual tuning and fixed implementation within camera imaging pipelines or photo editing tools. We propose to learn parameter value maps progressively for local Laplacian filters from annotated data using a lightweight network. Our model achieves simultaneous global tone manipulation and local edge detail preservation in an end-to-end manner. Extensive experimental results on two benchmark datasets demonstrate that the proposed method performs favorably against state-of-the-art methods. △ Less

Submitted 3 January, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: 12 pages, 6 figures, accepted by NeurlPS 2023

arXiv:2310.15290 [pdf, other]

Reliable Generation of EHR Time Series via Diffusion Models

Authors: Muhang Tian, Bernie Chen, Allan Guo, Shiyi Jiang, Anru R. Zhang

Abstract: Electronic Health Records (EHRs) are rich sources of patient-level data, including laboratory tests, medications, and diagnoses, offering valuable resources for medical data analysis. However, concerns about privacy often restrict access to EHRs, hindering downstream analysis. Researchers have explored various methods for generating privacy-preserving EHR data. In this study, we introduce a new me… ▽ More Electronic Health Records (EHRs) are rich sources of patient-level data, including laboratory tests, medications, and diagnoses, offering valuable resources for medical data analysis. However, concerns about privacy often restrict access to EHRs, hindering downstream analysis. Researchers have explored various methods for generating privacy-preserving EHR data. In this study, we introduce a new method for generating diverse and realistic synthetic EHR time series data using Denoising Diffusion Probabilistic Models (DDPM). We conducted experiments on six datasets, comparing our proposed method with eight existing methods. Our results demonstrate that our approach significantly outperforms all existing methods in terms of data utility while requiring less training effort. Our approach also enhances downstream medical data analysis by providing diverse and realistic synthetic EHR data. △ Less

Submitted 21 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.14821 [pdf, other]

Mysticeti: Reaching the Limits of Latency with Uncertified DAGs

Authors: Kushal Babel, Andrey Chursin, George Danezis, Anastasios Kichidis, Lefteris Kokoris-Kogias, Arun Koshy, Alberto Sonnino, Mingwei Tian

Abstract: We introduce Mysticeti-C the first DAG-based Byzantine consensus protocol to achieve the lower bounds of latency of 3 message rounds. Since Mysticeti-C is built over DAGs it also achieves high resource efficiency and censorship resistance. Mysticeti-C achieves this latency improvement by avoiding explicit certification of the DAG blocks and by proposing a novel commit rule such that every block ca… ▽ More We introduce Mysticeti-C the first DAG-based Byzantine consensus protocol to achieve the lower bounds of latency of 3 message rounds. Since Mysticeti-C is built over DAGs it also achieves high resource efficiency and censorship resistance. Mysticeti-C achieves this latency improvement by avoiding explicit certification of the DAG blocks and by proposing a novel commit rule such that every block can be committed without delays, resulting in optimal latency in the steady state and under crash failures. We further extend Mysticeti-C to Mysticeti-FPC, which incorporates a fast commit path that achieves even lower latency for transferring assets. Unlike prior fast commit path protocols, Mysticeti-FPC minimizes the number of signatures and messages by weaving the fast path transactions into the DAG. This frees up resources, which subsequently result in better performance. We prove the safety and liveness of the protocols in a Byzantine context. We evaluate Mysticeti and compare it with state-of-the-art consensus and fast path protocols to demonstrate its low latency and resource efficiency, as well as its more graceful degradation under crash failures. Mysticeti is the first Byzantine consensus protocol to achieve WAN latency of 0.5s for consensus commit while simultaneously maintaining state-of-the-art throughput of over 100k TPS. Finally, we report on integrating Mysticeti-C as the consensus protocol into a major blockchain, resulting in 4x latency reduction. △ Less

Submitted 30 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.00052 [pdf, other]

AI ensemble for signal detection of higher order gravitational wave modes of quasi-circular, spinning, non-precessing binary black hole mergers

Authors: Minyang Tian, E. A. Huerta, Huihuo Zheng

Abstract: We introduce spatiotemporal-graph models that concurrently process data from the twin advanced LIGO detectors and the advanced Virgo detector. We trained these AI classifiers with 2.4 million IMRPhenomXPHM waveforms that describe quasi-circular, spinning, non-precessing binary black hole mergers with component masses $m_{\{1,2\}}\in[3M_\odot, 50 M_\odot]$, and individual spins… ▽ More We introduce spatiotemporal-graph models that concurrently process data from the twin advanced LIGO detectors and the advanced Virgo detector. We trained these AI classifiers with 2.4 million IMRPhenomXPHM waveforms that describe quasi-circular, spinning, non-precessing binary black hole mergers with component masses $m_{\{1,2\}}\in[3M_\odot, 50 M_\odot]$, and individual spins $s^z_{\{1,2\}}\in[-0.9, 0.9]$; and which include the $(\ell, |m|) = \{(2, 2), (2, 1), (3, 3), (3, 2), (4, 4)\}$ modes, and mode mixing effects in the $\ell = 3, |m| = 2$ harmonics. We trained these AI classifiers within 22 hours using distributed training over 96 NVIDIA V100 GPUs in the Summit supercomputer. We then used transfer learning to create AI predictors that estimate the total mass of potential binary black holes identified by all AI classifiers in the ensemble. We used this ensemble, 3 classifiers for signal detection and 2 total mass predictors, to process a year-long test set in which we injected 300,000 signals. This year-long test set was processed within 5.19 minutes using 1024 NVIDIA A100 GPUs in the Polaris supercomputer (for AI inference) and 128 CPU nodes in the ThetaKNL supercomputer (for post-processing of noise triggers), housed at the Argonne Leadership Computing Facility. These studies indicate that our AI ensemble provides state-of-the-art signal detection accuracy, and reports 2 misclassifications for every year of searched data. This is the first AI ensemble designed to search for and find higher order gravitational wave mode signals. △ Less

Submitted 4 December, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

Comments: 4 pages, 2 figures, 1 table; v2: 5 pages, 2 figures, 1 table, accepted to NeurIPS 2023 workshop on Machine Learning and the Physical Sciences

MSC Class: 68T01; 68T35; 83C35; 83C57

arXiv:2307.02019 [pdf]

Generative Adversarial Networks for Dental Patient Identity Protection in Orthodontic Educational Imaging

Authors: Mingchuan Tian, Wilson Weixun Lu, Kelvin Weng Chiong Foong, Eugene Loh

Abstract: Objectives: This research introduces a novel area-preserving Generative Adversarial Networks (GAN) inversion technique for effectively de-identifying dental patient images. This innovative method addresses privacy concerns while preserving key dental features, thereby generating valuable resources for dental education and research. Methods: We enhanced the existing GAN Inversion methodology to m… ▽ More Objectives: This research introduces a novel area-preserving Generative Adversarial Networks (GAN) inversion technique for effectively de-identifying dental patient images. This innovative method addresses privacy concerns while preserving key dental features, thereby generating valuable resources for dental education and research. Methods: We enhanced the existing GAN Inversion methodology to maximize the preservation of dental characteristics within the synthesized images. A comprehensive technical framework incorporating several deep learning models was developed to provide end-to-end development guidance and practical application for image de-identification. Results: Our approach was assessed with varied facial pictures, extensively used for diagnosing skeletal asymmetry and facial anomalies. Results demonstrated our model's ability to adapt the context from one image to another, maintaining compatibility, while preserving dental features essential for oral diagnosis and dental education. A panel of five clinicians conducted an evaluation on a set of original and GAN-processed images. The generated images achieved effective de-identification, maintaining the realism of important dental features and were deemed useful for dental diagnostics and education. Clinical Significance: Our GAN model and the encompassing framework can streamline the de-identification process of dental patient images, enhancing efficiency in dental education. This method improves students' diagnostic capabilities by offering more exposure to orthodontic malocclusions. Furthermore, it facilitates the creation of de-identified datasets for broader 2D image research at major research institutions. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.15914 [pdf, other]

The 2nd Place Solution for 2023 Waymo Open Sim Agents Challenge

Authors: Cheng Qian, Di Xiu, Minghao Tian

Abstract: In this technical report, we present the 2nd place solution of 2023 Waymo Open Sim Agents Challenge (WOSAC)[4]. We propose a simple yet effective autoregressive method for simulating multi-agent behaviors, which is built upon a well-known multimodal motion forecasting framework called Motion Transformer (MTR)[5] with postprocessing algorithms applied. Our submission named MTR+++ achieves 0.4697 on… ▽ More In this technical report, we present the 2nd place solution of 2023 Waymo Open Sim Agents Challenge (WOSAC)[4]. We propose a simple yet effective autoregressive method for simulating multi-agent behaviors, which is built upon a well-known multimodal motion forecasting framework called Motion Transformer (MTR)[5] with postprocessing algorithms applied. Our submission named MTR+++ achieves 0.4697 on the Realism Meta metric in 2023 WOSAC. Besides, a modified model based on MTR named MTR_E is proposed after the challenge, which has a better score 0.4911 and is ranked the 3rd on the leaderboard of WOSAC as of June 25, 2023. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.15728 [pdf, other]

doi 10.1088/2632-2153/ad4c37

Physics-inspired spatiotemporal-graph AI ensemble for the detection of higher order wave mode signals of spinning binary black hole mergers

Authors: Minyang Tian, E. A. Huerta, Huihuo Zheng, Prayush Kumar

Abstract: We present a new class of AI models for the detection of quasi-circular, spinning, non-precessing binary black hole mergers whose waveforms include the higher order gravitational wave modes $(l, |m|)=\{(2, 2), (2, 1), (3, 3), (3, 2), (4, 4)\}$, and mode mixing effects in the $l = 3, |m| = 2$ harmonics. These AI models combine hybrid dilated convolution neural networks to accurately model both shor… ▽ More We present a new class of AI models for the detection of quasi-circular, spinning, non-precessing binary black hole mergers whose waveforms include the higher order gravitational wave modes $(l, |m|)=\{(2, 2), (2, 1), (3, 3), (3, 2), (4, 4)\}$, and mode mixing effects in the $l = 3, |m| = 2$ harmonics. These AI models combine hybrid dilated convolution neural networks to accurately model both short- and long-range temporal sequential information of gravitational waves; and graph neural networks to capture spatial correlations among gravitational wave observatories to consistently describe and identify the presence of a signal in a three detector network encompassing the Advanced LIGO and Virgo detectors. We first trained these spatiotemporal-graph AI models using synthetic noise, using 1.2 million modeled waveforms to densely sample this signal manifold, within 1.7 hours using 256 A100 GPUs in the Polaris supercomputer at the ALCF. Our distributed training approach had optimal performance, and strong scaling up to 512 A100 GPUs. With these AI ensembles we processed data from a three detector network, and found that an ensemble of 4 AI models achieves state-of-the-art performance for signal detection, and reports two misclassifications for every decade of searched data. We distributed AI inference over 128 GPUs in the Polaris supercomputer and 128 nodes in the Theta supercomputer, and completed the processing of a decade of gravitational wave data from a three detector network within 3.5 hours. Finally, we fine-tuned these AI ensembles to process the entire month of February 2020, which is part of the O3b LIGO/Virgo observation run, and found 6 gravitational waves, concurrently identified in Advanced LIGO and Advanced Virgo data, and zero false positives. This analysis was completed in one hour using one A100 GPU. △ Less

Submitted 18 June, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: 14 pages, 6 figures, and 3 tables

MSC Class: 68T01; 68T35; 83C35; 83C57

Journal ref: Mach. Learn.: Sci. Technol. 5 (2024) 025056

arXiv:2304.07238 [pdf, other]

doi 10.1103/PhysRevE.108.054302

Robustness of community structure under edge addition

Authors: Moyi Tian, Pablo Moriano

Abstract: Communities often represent key structural and functional clusters in networks. To preserve such communities, it is important to understand their robustness under network perturbations. Previous work in community robustness analysis has focused on studying changes in the community structure as a response of edge rewiring and node or edge removal. However, the impact of increasing connectivity on t… ▽ More Communities often represent key structural and functional clusters in networks. To preserve such communities, it is important to understand their robustness under network perturbations. Previous work in community robustness analysis has focused on studying changes in the community structure as a response of edge rewiring and node or edge removal. However, the impact of increasing connectivity on the robustness of communities in networked systems is relatively unexplored. Studying the limits of community robustness under edge addition is crucial to better understanding the cases in which density expands or false edges erroneously appear. In this paper, we analyze the effect of edge addition on community robustness in synthetic and empirical temporal networks. We study two scenarios of edge addition: random and targeted. We use four community detection algorithms, Infomap, Label Propagation, Leiden, and Louvain, and demonstrate the results in community similarity metrics. The experiments on synthetic networks show that communities are more robust when the initial partition is stronger or the edge addition is random, and the experiments on empirical data also indicate that robustness performance can be affected by the community similarity metric. Overall, our results suggest that the communities identified by the different types of community detection algorithms exhibit different levels of robustness, and so the robustness of communities depends strongly on the choice of detection method. △ Less

Submitted 1 November, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: 17 pages, 30 figures

Journal ref: Phys. Rev. E 108 (2023) 054302

arXiv:2302.14350 [pdf, other]

Knowledge Augmented Relation Inference for Group Activity Recognition

Authors: Xianglong Lang, Zhuming Wang, Zun Li, Meng Tian, Ge Shi, Lifang Wu, Liang Wang

Abstract: Most existing group activity recognition methods construct spatial-temporal relations merely based on visual representation. Some methods introduce extra knowledge, such as action labels, to build semantic relations and use them to refine the visual presentation. However, the knowledge they explored just stay at the semantic-level, which is insufficient for pursing notable accuracy. In this paper,… ▽ More Most existing group activity recognition methods construct spatial-temporal relations merely based on visual representation. Some methods introduce extra knowledge, such as action labels, to build semantic relations and use them to refine the visual presentation. However, the knowledge they explored just stay at the semantic-level, which is insufficient for pursing notable accuracy. In this paper, we propose to exploit knowledge concretization for the group activity recognition, and develop a novel Knowledge Augmented Relation Inference framework that can effectively use the concretized knowledge to improve the individual representations. Specifically, the framework consists of a Visual Representation Module to extract individual appearance features, a Knowledge Augmented Semantic Relation Module explore semantic representations of individual actions, and a Knowledge-Semantic-Visual Interaction Module aims to integrate visual and semantic information by the knowledge. Benefiting from these modules, the proposed framework can utilize knowledge to enhance the relation inference process and the individual representations, thus improving the performance of group activity recognition. Experimental results on two public datasets show that the proposed framework achieves competitive performance compared with state-of-the-art methods. △ Less

Submitted 1 March, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

arXiv:2212.01382 [pdf, other]

doi 10.5555/3545946.3598870

Welfare and Fairness in Multi-objective Reinforcement Learning

Authors: Zimeng Fan, Nianli Peng, Muhang Tian, Brandon Fain

Abstract: We study fair multi-objective reinforcement learning in which an agent must learn a policy that simultaneously achieves high reward on multiple dimensions of a vector-valued reward. Motivated by the fair resource allocation literature, we model this as an expected welfare maximization problem, for some nonlinear fair welfare function of the vector of long-term cumulative rewards. One canonical exa… ▽ More We study fair multi-objective reinforcement learning in which an agent must learn a policy that simultaneously achieves high reward on multiple dimensions of a vector-valued reward. Motivated by the fair resource allocation literature, we model this as an expected welfare maximization problem, for some nonlinear fair welfare function of the vector of long-term cumulative rewards. One canonical example of such a function is the Nash Social Welfare, or geometric mean, the log transform of which is also known as the Proportional Fairness objective. We show that even approximately optimal optimization of the expected Nash Social Welfare is computationally intractable even in the tabular case. Nevertheless, we provide a novel adaptation of Q-learning that combines nonlinear scalarized learning updates and non-stationary action selection to learn effective policies for optimizing nonlinear welfare functions. We show that our algorithm is provably convergent, and we demonstrate experimentally that our approach outperforms techniques based on linear scalarization, mixtures of optimal linear scalarizations, or stationary action selection for the Nash Social Welfare Objective. △ Less

Submitted 12 November, 2023; v1 submitted 29 November, 2022; originally announced December 2022.

arXiv:2211.10805 [pdf, other]

On the Pointwise Behavior of Recursive Partitioning and Its Implications for Heterogeneous Causal Effect Estimation

Authors: Matias D. Cattaneo, Jason M. Klusowski, Peter M. Tian

Abstract: Decision tree learning is increasingly being used for pointwise inference. Important applications include causal heterogenous treatment effects and dynamic policy decisions, as well as conditional quantile regression and design of experiments, where tree estimation and inference is conducted at specific values of the covariates. In this paper, we call into question the use of decision trees (train… ▽ More Decision tree learning is increasingly being used for pointwise inference. Important applications include causal heterogenous treatment effects and dynamic policy decisions, as well as conditional quantile regression and design of experiments, where tree estimation and inference is conducted at specific values of the covariates. In this paper, we call into question the use of decision trees (trained by adaptive recursive partitioning) for such purposes by demonstrating that they can fail to achieve polynomial rates of convergence in uniform norm with non-vanishing probability, even with pruning. Instead, the convergence may be arbitrarily slow or, in some important special cases, such as honest regression trees, fail completely. We show that random forests can remedy the situation, turning poor performing trees into nearly optimal procedures, at the cost of losing interpretability and introducing two additional tuning parameters. The two hallmarks of random forests, subsampling and the random feature selection mechanism, are seen to each distinctively contribute to achieving nearly optimal performance for the model class considered. △ Less

Submitted 6 February, 2024; v1 submitted 19 November, 2022; originally announced November 2022.

arXiv:2209.07862 [pdf, other]

doi 10.1145/3585088.3589353

What Do Children and Parents Want and Perceive in Conversational Agents? Towards Transparent, Trustworthy, Democratized Agents

Authors: Jessica Van Brummelen, Maura Kelleher, Mingyan Claire Tian, Nghi Hoang Nguyen

Abstract: Historically, researchers have focused on analyzing WEIRD, adult perspectives on technology. This means we may not have technology developed appropriately for children and those from non-WEIRD countries. In this paper, we analyze children and parents from various countries' perspectives on an emerging technology: conversational agents. We aim to better understand participants' trust of agents, par… ▽ More Historically, researchers have focused on analyzing WEIRD, adult perspectives on technology. This means we may not have technology developed appropriately for children and those from non-WEIRD countries. In this paper, we analyze children and parents from various countries' perspectives on an emerging technology: conversational agents. We aim to better understand participants' trust of agents, partner models, and their ideas of "ideal future agents" such that researchers can better design for these users. Additionally, we empower children and parents to program their own agents through educational workshops, and present changes in perceptions as participants create and learn about agents. Results from the study (n=49) included how children felt agents were significantly more human-like, warm, and dependable than parents did, how participants trusted agents more than parents or friends for correct information, how children described their ideal agents as being more artificial than human-like than parents did, and how children tended to focus more on fun features, approachable/friendly features and addressing concerns through agent design than parents did, among other results. We also discuss potential agent design implications of the results, including how designers may be able to best foster appropriate levels of trust towards agents by focusing on designing agents' competence and predictability indicators, as well as increasing transparency in terms of agents' information sources. △ Less

Submitted 20 January, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

Comments: 18 pages, 9 figures, submitted to IDC 2023, for associated appendix: https://gist.github.com/jessvb/fa1d4c75910106d730d194ffd4d725d3

arXiv:2209.05063 [pdf, other]

Learning Affects Trust: Design Recommendations and Concepts for Teaching Children -- and Nearly Anyone -- about Conversational Agents

Authors: Jessica Van Brummelen, Mingyan Claire Tian, Maura Kelleher, Nghi Hoang Nguyen

Abstract: Research has shown that human-agent relationships form in similar ways to human-human relationships. Since children do not have the same critical analysis skills as adults (and may over-trust technology, for example), this relationship-formation is concerning. Nonetheless, little research investigates children's perceptions of conversational agents in-depth, and even less investigates how educatio… ▽ More Research has shown that human-agent relationships form in similar ways to human-human relationships. Since children do not have the same critical analysis skills as adults (and may over-trust technology, for example), this relationship-formation is concerning. Nonetheless, little research investigates children's perceptions of conversational agents in-depth, and even less investigates how education might change these perceptions. We present K-12 workshops with associated conversational AI concepts to encourage healthier understanding and relationships with agents. Through studies with the curriculum, and children and parents from various countries, we found participants' perceptions of agents -- specifically their partner models and trust -- changed. When participants discussed changes in trust of agents, we found they most often mentioned learning something. For example, they frequently mentioned learning where agents obtained information, what agents do with this information and how agents are programmed. Based on the results, we developed recommendations for teaching conversational agent concepts, including emphasizing the concepts students found most challenging, like training, turn-taking and terminology; supplementing agent development activities with related learning activities; fostering appropriate levels of trust towards agents; and fostering accurate partner models of agents. Through such pedagogy, students can learn to better understand conversational AI and what it means to have it in the world. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: 9 pages, 11 figures, submitted to EAAI at AAAI 2023, for associated appendix: https://gist.github.com/jessvb/e35bc0daf859c30f73008a1ad1b37824

arXiv:2208.11353 [pdf, ps, other]

Research on Mask Wearing Detection of Natural Population Based on Improved YOLOv4

Authors: Xuecheng Wu, Mengmeng Tian, Lanhang Zhai

Abstract: Recently, the domestic COVID-19 epidemic situation has been serious, but in some public places, some people do not wear masks or wear masks incorrectly, which requires the relevant staff to instantly remind and supervise them to wear masks correctly. However, in the face of such important and complicated work, it is necessary to carry out automated mask wearing detection in public places. This pap… ▽ More Recently, the domestic COVID-19 epidemic situation has been serious, but in some public places, some people do not wear masks or wear masks incorrectly, which requires the relevant staff to instantly remind and supervise them to wear masks correctly. However, in the face of such important and complicated work, it is necessary to carry out automated mask wearing detection in public places. This paper proposes a new mask wearing detection method based on the improved YOLOv4. Specifically, firstly, we add the Coordinate Attention Module to the backbone to coordinate feature fusion and representation. Secondly, we conduct a series of network structural improvements to enhance the model performance and robustness. Thirdly, we deploy the K-means clustering algorithm to make the nine anchor boxes more suitable for our NPMD dataset. The experimental results show that the improved YOLOv4 performs better, exceeding the baseline by 4.06% AP with a comparable speed of 64.37 FPS. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: 4 pages, 1 figures

arXiv:2208.11346 [pdf, ps, other]

ICANet: A Method of Short Video Emotion Recognition Driven by Multimodal Data

Authors: Xuecheng Wu, Mengmeng Tian, Lanhang Zhai

Abstract: With the fast development of artificial intelligence and short videos, emotion recognition in short videos has become one of the most important research topics in human-computer interaction. At present, most emotion recognition methods still stay in a single modality. However, in daily life, human beings will usually disguise their real emotions, which leads to the problem that the accuracy of sin… ▽ More With the fast development of artificial intelligence and short videos, emotion recognition in short videos has become one of the most important research topics in human-computer interaction. At present, most emotion recognition methods still stay in a single modality. However, in daily life, human beings will usually disguise their real emotions, which leads to the problem that the accuracy of single modal emotion recognition is relatively terrible. Moreover, it is not easy to distinguish similar emotions. Therefore, we propose a new approach denoted as ICANet to achieve multimodal short video emotion recognition by employing three different modalities of audio, video and optical flow, making up for the lack of a single modality and then improving the accuracy of emotion recognition in short videos. ICANet has a better accuracy of 80.77% on the IEMOCAP benchmark, exceeding the SOTA methods by 15.89%. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: 4 pages, 5 figures

arXiv:2206.05488 [pdf]

Kaggle Kinship Recognition Challenge: Introduction of Convolution-Free Model to boost conventional

Authors: Mingchuan Tian, Guangway Teng, Yipeng Bao

Abstract: This work aims to explore a convolution-free base classifier that can be used to widen the variations of the conventional ensemble classifier. Specifically, we propose Vision Transformers as base classifiers to combine with CNNs for a unique ensemble solution in Kaggle kinship recognition. In this paper, we verify our proposed idea by implementing and optimizing variants of the Vision Transformer… ▽ More This work aims to explore a convolution-free base classifier that can be used to widen the variations of the conventional ensemble classifier. Specifically, we propose Vision Transformers as base classifiers to combine with CNNs for a unique ensemble solution in Kaggle kinship recognition. In this paper, we verify our proposed idea by implementing and optimizing variants of the Vision Transformer model on top of the existing CNN models. The combined models achieve better scores than conventional ensemble classifiers based solely on CNN variants. We demonstrate that highly optimized CNN ensembles publicly available on the Kaggle Discussion board can easily achieve a significant boost in ROC score by simply ensemble with variants of the Vision Transformer model due to low correlation. △ Less

Submitted 11 June, 2022; originally announced June 2022.

arXiv:2205.10323 [pdf]

Low power communication signal enhancement method of Internet of things based on nonlocal mean denoising

Authors: Mingchuan Tian, Jizheng Liu

Abstract: In order to improve the transmission effect of low-power communication signal of Internet of things and compress the enhancement time of low-power communication signal, this paper designs a low-power communication signal enhancement method of Internet of things based on nonlocal mean denoising. Firstly, the residual of one-dimensional communication layer is pre processed by convolution core to obt… ▽ More In order to improve the transmission effect of low-power communication signal of Internet of things and compress the enhancement time of low-power communication signal, this paper designs a low-power communication signal enhancement method of Internet of things based on nonlocal mean denoising. Firstly, the residual of one-dimensional communication layer is pre processed by convolution core to obtain the residual of one-dimensional communication layer; Then, according to the two classification recognition method, the noise reduction signal feature recognition of the low-power communication signal of the Internet of things is realized, the non local mean noise reduction algorithm is used to remove the low-power communication signal of the Internet of things, and the weight value between similar blocks is calculated according to the European distance method. Finally, the low-power communication signal enhancement of the Internet of things is realized by the non local mean value denoising method. The experimental results show that the communication signal enhancement time overhead of this method is low, which is always less than 2.6s. The lowest bit error rate after signal enhancement is about 1%, and the signal-to-noise ratio is up to 18 dB, which shows that this method can achieve signal enhancement. △ Less

Submitted 14 May, 2022; originally announced May 2022.

arXiv:2203.03498 [pdf, other]

Weakly Supervised Learning of Keypoints for 6D Object Pose Estimation

Authors: Meng Tian, Gim Hee Lee

Abstract: State-of-the-art approaches for 6D object pose estimation require large amounts of labeled data to train the deep networks. However, the acquisition of 6D object pose annotations is tedious and labor-intensive in large quantity. To alleviate this problem, we propose a weakly supervised 6D object pose estimation approach based on 2D keypoint detection. Our method trains only on image pairs with kno… ▽ More State-of-the-art approaches for 6D object pose estimation require large amounts of labeled data to train the deep networks. However, the acquisition of 6D object pose annotations is tedious and labor-intensive in large quantity. To alleviate this problem, we propose a weakly supervised 6D object pose estimation approach based on 2D keypoint detection. Our method trains only on image pairs with known relative transformations between their viewpoints. Specifically, we assign a set of arbitrarily chosen 3D keypoints to represent each unknown target 3D object and learn a network to detect their 2D projections that comply with the relative camera viewpoints. During inference, our network first infers the 2D keypoints from the query image and a given labeled reference image. We then use these 2D keypoints and the arbitrarily chosen 3D keypoints retained from training to infer the 6D object pose. Extensive experiments demonstrate that our approach achieves comparable performance with state-of-the-art fully supervised approaches. △ Less

Submitted 7 March, 2022; originally announced March 2022.

arXiv:2202.09554 [pdf]

doi 10.1016/j.autcon.2022.104499

SODA: Site Object Detection dAtaset for Deep Learning in Construction

Authors: Rui Duan, Hui Deng, Mao Tian, Yichuan Deng, Jiarui Lin

Abstract: Computer vision-based deep learning object detection algorithms have been developed sufficiently powerful to support the ability to recognize various objects. Although there are currently general datasets for object detection, there is still a lack of large-scale, open-source dataset for the construction industry, which limits the developments of object detection algorithms as they tend to be data… ▽ More Computer vision-based deep learning object detection algorithms have been developed sufficiently powerful to support the ability to recognize various objects. Although there are currently general datasets for object detection, there is still a lack of large-scale, open-source dataset for the construction industry, which limits the developments of object detection algorithms as they tend to be data-hungry. Therefore, this paper develops a new large-scale image dataset specifically collected and annotated for the construction site, called Site Object Detection dAtaset (SODA), which contains 15 kinds of object classes categorized by workers, materials, machines, and layout. Firstly, more than 20,000 images were collected from multiple construction sites in different site conditions, weather conditions, and construction phases, which covered different angles and perspectives. After careful screening and processing, 19,846 images including 286,201 objects were then obtained and annotated with labels in accordance with predefined categories. Statistical analysis shows that the developed dataset is advantageous in terms of diversity and volume. Further evaluation with two widely-adopted object detection algorithms based on deep learning (YOLO v3/ YOLO v4) also illustrates the feasibility of the dataset for typical construction scenarios, achieving a maximum mAP of 81.47%. In this manner, this research contributes a large-scale image dataset for the development of deep learning-based object detection methods in the construction industry and sets up a performance benchmark for further evaluation of corresponding algorithms in this area. △ Less

Submitted 19 February, 2022; originally announced February 2022.

Journal ref: Automation in Construction, 2022

arXiv:2202.06548 [pdf, other]

A resource-efficient deep learning framework for low-dose brain PET image reconstruction and analysis

Authors: Yu Fu, Shunjie Dong, Yi Liao, Le Xue, Yuanfan Xu, Feng Li, Qianqian Yang, Tianbai Yu, Mei Tian, Cheng Zhuo

Abstract: 18F-fluorodeoxyglucose (18F-FDG) Positron Emission Tomography (PET) imaging usually needs a full-dose radioactive tracer to obtain satisfactory diagnostic results, which raises concerns about the potential health risks of radiation exposure, especially for pediatric patients. Reconstructing the low-dose PET (L-PET) images to the high-quality full-dose PET (F-PET) ones is an effective way that both… ▽ More 18F-fluorodeoxyglucose (18F-FDG) Positron Emission Tomography (PET) imaging usually needs a full-dose radioactive tracer to obtain satisfactory diagnostic results, which raises concerns about the potential health risks of radiation exposure, especially for pediatric patients. Reconstructing the low-dose PET (L-PET) images to the high-quality full-dose PET (F-PET) ones is an effective way that both reduces the radiation exposure and remains diagnostic accuracy. In this paper, we propose a resource-efficient deep learning framework for L-PET reconstruction and analysis, referred to as transGAN-SDAM, to generate F-PET from corresponding L-PET, and quantify the standard uptake value ratios (SUVRs) of these generated F-PET at whole brain. The transGAN-SDAM consists of two modules: a transformer-encoded Generative Adversarial Network (transGAN) and a Spatial Deformable Aggregation Module (SDAM). The transGAN generates higher quality F-PET images, and then the SDAM integrates the spatial information of a sequence of generated F-PET slices to synthesize whole-brain F-PET images. Experimental results demonstrate the superiority and rationality of our approach. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2201.11133 [pdf, other]

doi 10.3389/frai.2022.828672

Inference-optimized AI and high performance computing for gravitational wave detection at scale

Authors: Pranshu Chaturvedi, Asad Khan, Minyang Tian, E. A. Huerta, Huihuo Zheng

Abstract: We introduce an ensemble of artificial intelligence models for gravitational wave detection that we trained in the Summit supercomputer using 32 nodes, equivalent to 192 NVIDIA V100 GPUs, within 2 hours. Once fully trained, we optimized these models for accelerated inference using NVIDIA TensorRT. We deployed our inference-optimized AI ensemble in the ThetaGPU supercomputer at Argonne Leadership C… ▽ More We introduce an ensemble of artificial intelligence models for gravitational wave detection that we trained in the Summit supercomputer using 32 nodes, equivalent to 192 NVIDIA V100 GPUs, within 2 hours. Once fully trained, we optimized these models for accelerated inference using NVIDIA TensorRT. We deployed our inference-optimized AI ensemble in the ThetaGPU supercomputer at Argonne Leadership Computer Facility to conduct distributed inference. Using the entire ThetaGPU supercomputer, consisting of 20 nodes each of which has 8 NVIDIA A100 Tensor Core GPUs and 2 AMD Rome CPUs, our NVIDIA TensorRT-optimized AI ensemble processed an entire month of advanced LIGO data (including Hanford and Livingston data streams) within 50 seconds. Our inference-optimized AI ensemble retains the same sensitivity of traditional AI models, namely, it identifies all known binary black hole mergers previously identified in this advanced LIGO dataset and reports no misclassifications, while also providing a 3X inference speedup compared to traditional artificial intelligence models. We used time slides to quantify the performance of our AI ensemble to process up to 5 years worth of advanced LIGO data. In this synthetically enhanced dataset, our AI ensemble reports an average of one misclassification for every month of searched advanced LIGO data. We also present the receiver operating characteristic curve of our AI ensemble using this 5 year long advanced LIGO dataset. This approach provides the required tools to conduct accelerated, AI-driven gravitational wave detection at scale. △ Less

Submitted 17 February, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

Comments: 19 pages, 8 figures; v2. Accepted to Frontiers in Artificial Intelligence, Special Issue: Efficient AI in Particle Physics and Astrophysics

MSC Class: 68T10; 85-08; 83C35; 83C57 ACM Class: I.2

Journal ref: Front. Artif. Intell. 5:828672 (2022)

arXiv:2201.04019 [pdf, other]

Pyramid Fusion Transformer for Semantic Segmentation

Authors: Zipeng Qin, Jianbo Liu, Xiaolin Zhang, Maoqing Tian, Aojun Zhou, Shuai Yi, Hongsheng Li

Abstract: The recently proposed MaskFormer gives a refreshed perspective on the task of semantic segmentation: it shifts from the popular pixel-level classification paradigm to a mask-level classification method. In essence, it generates paired probabilities and masks corresponding to category segments and combines them during inference for the segmentation maps. In our study, we find that per-mask classifi… ▽ More The recently proposed MaskFormer gives a refreshed perspective on the task of semantic segmentation: it shifts from the popular pixel-level classification paradigm to a mask-level classification method. In essence, it generates paired probabilities and masks corresponding to category segments and combines them during inference for the segmentation maps. In our study, we find that per-mask classification decoder on top of a single-scale feature is not effective enough to extract reliable probability or mask. To mine for rich semantic information across the feature pyramid, we propose a transformer-based Pyramid Fusion Transformer (PFT) for per-mask approach semantic segmentation with multi-scale features. The proposed transformer decoder performs cross-attention between the learnable queries and each spatial feature from the feature pyramid in parallel and uses cross-scale inter-query attention to exchange complimentary information. We achieve competitive performance on three widely used semantic segmentation datasets. In particular, on ADE20K validation set, our result with Swin-B backbone surpasses that of MaskFormer's with a much larger Swin-L backbone in both single-scale and multi-scale inference, achieving 54.1 mIoU and 55.7 mIoU respectively. Using a Swin-L backbone, we achieve single-scale 56.1 mIoU and multi-scale 57.4 mIoU, obtaining state-of-the-art performance on the dataset. Extensive experiments on three widely used semantic segmentation datasets verify the effectiveness of our proposed method. △ Less

Submitted 30 May, 2023; v1 submitted 11 January, 2022; originally announced January 2022.

arXiv:2110.13261 [pdf, other]

SWAP Test for an Arbitrary Number of Quantum States

Authors: Xavier Gitiaux, Ian Morris, Maria Emelianenko, Mingzhen Tian

Abstract: We develop a recursive algorithm to generalize the quantum SWAP test for an arbitrary number $m$ of quantum states requiring $O(m)$ controlled-swap (CSWAP) gates and $O(\log m)$ ancillary qubits. We construct a quantum circuit able to simultaneously measure overlaps of $m$ arbitrary pure states. Our construction relies on a pairing unitary that generates a superposition state where every pair of i… ▽ More We develop a recursive algorithm to generalize the quantum SWAP test for an arbitrary number $m$ of quantum states requiring $O(m)$ controlled-swap (CSWAP) gates and $O(\log m)$ ancillary qubits. We construct a quantum circuit able to simultaneously measure overlaps of $m$ arbitrary pure states. Our construction relies on a pairing unitary that generates a superposition state where every pair of input states is labelled by a basis state formed by the ancillaries. △ Less

Submitted 25 October, 2021; originally announced October 2021.

arXiv:2109.02303 [pdf, other]

Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Authors: Ziniu Wan, Zhengjia Li, Maoqing Tian, Jianbo Liu, Shuai Yi, Hongsheng Li

Abstract: 3D human shape and pose estimation is the essential task for human motion analysis, which is widely used in many 3D applications. However, existing methods cannot simultaneously capture the relations at multiple levels, including spatial-temporal level and human joint level. Therefore they fail to make accurate predictions in some hard scenarios when there is cluttered background, occlusion, or ex… ▽ More 3D human shape and pose estimation is the essential task for human motion analysis, which is widely used in many 3D applications. However, existing methods cannot simultaneously capture the relations at multiple levels, including spatial-temporal level and human joint level. Therefore they fail to make accurate predictions in some hard scenarios when there is cluttered background, occlusion, or extreme pose. To this end, we propose Multi-level Attention Encoder-Decoder Network (MAED), including a Spatial-Temporal Encoder (STE) and a Kinematic Topology Decoder (KTD) to model multi-level attentions in a unified framework. STE consists of a series of cascaded blocks based on Multi-Head Self-Attention, and each block uses two parallel branches to learn spatial and temporal attention respectively. Meanwhile, KTD aims at modeling the joint level attention. It regards pose estimation as a top-down hierarchical process similar to SMPL kinematic tree. With the training set of 3DPW, MAED outperforms previous state-of-the-art methods by 6.2, 7.2, and 2.4 mm of PA-MPJPE on the three widely used benchmarks 3DPW, MPI-INF-3DHP, and Human3.6M respectively. Our code is available at https://github.com/ziniuwan/maed. △ Less

Submitted 6 September, 2021; originally announced September 2021.

arXiv:2108.08505 [pdf, other]

Blindly Assess Quality of In-the-Wild Videos via Quality-aware Pre-training and Motion Perception

Authors: Bowen Li, Weixia Zhang, Meng Tian, Guangtao Zhai, Xianpei Wang

Abstract: Perceptual quality assessment of the videos acquired in the wilds is of vital importance for quality assurance of video services. The inaccessibility of reference videos with pristine quality and the complexity of authentic distortions pose great challenges for this kind of blind video quality assessment (BVQA) task. Although model-based transfer learning is an effective and efficient paradigm for… ▽ More Perceptual quality assessment of the videos acquired in the wilds is of vital importance for quality assurance of video services. The inaccessibility of reference videos with pristine quality and the complexity of authentic distortions pose great challenges for this kind of blind video quality assessment (BVQA) task. Although model-based transfer learning is an effective and efficient paradigm for the BVQA task, it remains to be a challenge to explore what and how to bridge the domain shifts for better video representation. In this work, we propose to transfer knowledge from image quality assessment (IQA) databases with authentic distortions and large-scale action recognition with rich motion patterns. We rely on both groups of data to learn the feature extractor. We train the proposed model on the target VQA databases using a mixed list-wise ranking loss function. Extensive experiments on six databases demonstrate that our method performs very competitively under both individual database and mixed database training settings. We also verify the rationality of each component of the proposed method and explore a simple manner for further improvement. △ Less

Submitted 5 April, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

Comments: Accepted to IEEE TCSVT

arXiv:2108.05568 [pdf, other]

A Contract Theory based Incentive Mechanism for Federated Learning

Authors: Mengmeng Tian, Yuxin Chen, Yuan Liu, Zehui Xiong, Cyril Leung, Chunyan Miao

Abstract: Federated learning (FL) serves as a data privacy-preserved machine learning paradigm, and realizes the collaborative model trained by distributed clients. To accomplish an FL task, the task publisher needs to pay financial incentives to the FL server and FL server offloads the task to the contributing FL clients. It is challenging to design proper incentives for the FL clients due to the fact that… ▽ More Federated learning (FL) serves as a data privacy-preserved machine learning paradigm, and realizes the collaborative model trained by distributed clients. To accomplish an FL task, the task publisher needs to pay financial incentives to the FL server and FL server offloads the task to the contributing FL clients. It is challenging to design proper incentives for the FL clients due to the fact that the task is privately trained by the clients. This paper aims to propose a contract theory based FL task training model towards minimizing incentive budget subject to clients being individually rational (IR) and incentive compatible (IC) in each FL training round. We design a two-dimensional contract model by formally defining two private types of clients, namely data quality and computation effort. To effectively aggregate the trained models, a contract-based aggregator is proposed. We analyze the feasible and optimal contract solutions to the proposed contract model. %Experimental results demonstrate that the proposed framework and contract model can effective improve the generation accuracy of FL tasks. Experimental results show that the generalization accuracy of the FL tasks can be improved by the proposed incentive mechanism where contract-based aggregation is applied. △ Less

Submitted 12 August, 2021; originally announced August 2021.

Comments: 7 pages, 2 figures, International Workshop on Federated and Transfer Learning for Data Sparsity and Confidentiality in Conjunction with IJCAI 2021 (FTL-IJCAI'21), Best Student Paper Award

arXiv:2107.00222 [pdf, other]

Deep auxiliary learning for visual localization using colorization task

Authors: Mi Tian, Qiong Nie, Hao Shen, Xiahua Xia

Abstract: Visual localization is one of the most important components for robotics and autonomous driving. Recently, inspiring results have been shown with CNN-based methods which provide a direct formulation to end-to-end regress 6-DoF absolute pose. Additional information like geometric or semantic constraints is generally introduced to improve performance. Especially, the latter can aggregate high-level… ▽ More Visual localization is one of the most important components for robotics and autonomous driving. Recently, inspiring results have been shown with CNN-based methods which provide a direct formulation to end-to-end regress 6-DoF absolute pose. Additional information like geometric or semantic constraints is generally introduced to improve performance. Especially, the latter can aggregate high-level semantic information into localization task, but it usually requires enormous manual annotations. To this end, we propose a novel auxiliary learning strategy for camera localization by introducing scene-specific high-level semantics from self-supervised representation learning task. Viewed as a powerful proxy task, image colorization task is chosen as complementary task that outputs pixel-wise color version of grayscale photograph without extra annotations. In our work, feature representations from colorization network are embedded into localization network by design to produce discriminative features for pose regression. Meanwhile an attention mechanism is introduced for the benefit of localization performance. Extensive experiments show that our model significantly improve localization accuracy over state-of-the-arts on both indoor and outdoor datasets. △ Less

Submitted 1 July, 2021; originally announced July 2021.

arXiv:2105.11422 [pdf, other]

Multi-Level Attentive Convoluntional Neural Network for Crowd Counting

Authors: Mengxiao Tian, Hao Guo, Chengjiang Long

Abstract: Recently the crowd counting has received more and more attention. Especially the technology of high-density environment has become an important research content, and the relevant methods for the existence of extremely dense crowd are not optimal. In this paper, we propose a multi-level attentive Convolutional Neural Network (MLAttnCNN) for crowd counting. We extract high-level contextual informati… ▽ More Recently the crowd counting has received more and more attention. Especially the technology of high-density environment has become an important research content, and the relevant methods for the existence of extremely dense crowd are not optimal. In this paper, we propose a multi-level attentive Convolutional Neural Network (MLAttnCNN) for crowd counting. We extract high-level contextual information with multiple different scales applied in pooling, and use multi-level attention modules to enrich the characteristics at different layers to achieve more efficient multi-scale feature fusion, which is able to be used to generate a more accurate density map with dilated convolutions and a $1\times 1$ convolution. The extensive experiments on three available public datasets show that our proposed network achieves outperformance to the state-of-the-art approaches. △ Less

Submitted 24 May, 2021; originally announced May 2021.

arXiv:2104.13881 [pdf, ps, other]

Large Scale Prediction with Decision Trees

Authors: Jason M. Klusowski, Peter M. Tian

Abstract: This paper shows that decision trees constructed with Classification and Regression Trees (CART) and C4.5 methodology are consistent for regression and classification tasks, even when the number of predictor variables grows sub-exponentially with the sample size, under natural 0-norm and 1-norm sparsity constraints. The theory applies to a wide range of models, including (ordinary or logistic) add… ▽ More This paper shows that decision trees constructed with Classification and Regression Trees (CART) and C4.5 methodology are consistent for regression and classification tasks, even when the number of predictor variables grows sub-exponentially with the sample size, under natural 0-norm and 1-norm sparsity constraints. The theory applies to a wide range of models, including (ordinary or logistic) additive regression models with component functions that are continuous, of bounded variation, or, more generally, Borel measurable. Consistency holds for arbitrary joint distributions of the predictor variables, thereby accommodating continuous, discrete, and/or dependent data. Finally, we show that these qualitative properties of individual trees are inherited by Breiman's random forests. A key step in the analysis is the establishment of an oracle inequality, which allows for a precise characterization of the goodness-of-fit and complexity tradeoff for a mis-specified model. △ Less

Submitted 13 November, 2023; v1 submitted 28 April, 2021; originally announced April 2021.

arXiv:2102.11099 [pdf, other]

RCoNet: Deformable Mutual Information Maximization and High-order Uncertainty-aware Learning for Robust COVID-19 Detection

Authors: Shunjie Dong, Qianqian Yang, Yu Fu, Mei Tian, Cheng Zhuo

Abstract: The novel 2019 Coronavirus (COVID-19) infection has spread world widely and is currently a major healthcare challenge around the world. Chest Computed Tomography (CT) and X-ray images have been well recognized to be two effective techniques for clinical COVID-19 disease diagnoses. Due to faster imaging time and considerably lower cost than CT, detecting COVID-19 in chest X-ray (CXR) images is pref… ▽ More The novel 2019 Coronavirus (COVID-19) infection has spread world widely and is currently a major healthcare challenge around the world. Chest Computed Tomography (CT) and X-ray images have been well recognized to be two effective techniques for clinical COVID-19 disease diagnoses. Due to faster imaging time and considerably lower cost than CT, detecting COVID-19 in chest X-ray (CXR) images is preferred for efficient diagnosis, assessment and treatment. However, considering the similarity between COVID-19 and pneumonia, CXR samples with deep features distributed near category boundaries are easily misclassified by the hyper-planes learned from limited training data. Moreover, most existing approaches for COVID-19 detection focus on the accuracy of prediction and overlook the uncertainty estimation, which is particularly important when dealing with noisy datasets. To alleviate these concerns, we propose a novel deep network named {\em RCoNet$^k_s$} for robust COVID-19 detection which employs {\em Deformable Mutual Information Maximization} (DeIM), {\em Mixed High-order Moment Feature} (MHMF) and {\em Multi-expert Uncertainty-aware Learning} (MUL). With DeIM, the mutual information (MI) between input data and the corresponding latent representations can be well estimated and maximized to capture compact and disentangled representational characteristics. Meanwhile, MHMF can fully explore the benefits of using high-order statistics and extract discriminative features of complex distributions in medical imaging. Finally, MUL creates multiple parallel dropout networks for each CXR image to evaluate uncertainty and thus prevent performance degradation caused by the noise in the data. △ Less

Submitted 22 February, 2021; originally announced February 2021.

arXiv:2012.15419 [pdf, other]

An Experimental Evaluation of Transformer-based Language Models in the Biomedical Domain

Authors: Paul Grouchy, Shobhit Jain, Michael Liu, Kuhan Wang, Max Tian, Nidhi Arora, Hillary Ngai, Faiza Khan Khattak, Elham Dolatabadi, Sedef Akinli Kocak

Abstract: With the growing amount of text in health data, there have been rapid advances in large pre-trained models that can be applied to a wide variety of biomedical tasks with minimal task-specific modifications. Emphasizing the cost of these models, which renders technical replication challenging, this paper summarizes experiments conducted in replicating BioBERT and further pre-training and careful fi… ▽ More With the growing amount of text in health data, there have been rapid advances in large pre-trained models that can be applied to a wide variety of biomedical tasks with minimal task-specific modifications. Emphasizing the cost of these models, which renders technical replication challenging, this paper summarizes experiments conducted in replicating BioBERT and further pre-training and careful fine-tuning in the biomedical domain. We also investigate the effectiveness of domain-specific and domain-agnostic pre-trained models across downstream biomedical NLP tasks. Our finding confirms that pre-trained models can be impactful in some downstream NLP tasks (QA and NER) in the biomedical domain; however, this improvement may not justify the high cost of domain-specific pre-training. △ Less

Submitted 30 December, 2020; originally announced December 2020.

arXiv:2012.08545 [pdf, other]

doi 10.1038/s41550-021-01405-0

Accelerated, Scalable and Reproducible AI-driven Gravitational Wave Detection

Authors: E. A. Huerta, Asad Khan, Xiaobo Huang, Minyang Tian, Maksim Levental, Ryan Chard, Wei Wei, Maeve Heflin, Daniel S. Katz, Volodymyr Kindratenko, Dawei Mu, Ben Blaiszik, Ian Foster

Abstract: The development of reusable artificial intelligence (AI) models for wider use and rigorous validation by the community promises to unlock new opportunities in multi-messenger astrophysics. Here we develop a workflow that connects the Data and Learning Hub for Science, a repository for publishing AI models, with the Hardware Accelerated Learning (HAL) cluster, using funcX as a universal distributed… ▽ More The development of reusable artificial intelligence (AI) models for wider use and rigorous validation by the community promises to unlock new opportunities in multi-messenger astrophysics. Here we develop a workflow that connects the Data and Learning Hub for Science, a repository for publishing AI models, with the Hardware Accelerated Learning (HAL) cluster, using funcX as a universal distributed computing service. Using this workflow, an ensemble of four openly available AI models can be run on HAL to process an entire month's worth (August 2017) of advanced Laser Interferometer Gravitational-Wave Observatory data in just seven minutes, identifying all four all four binary black hole mergers previously identified in this dataset and reporting no misclassifications. This approach combines advances in AI, distributed computing, and scientific data infrastructure to open new pathways to conduct reproducible, accelerated, data-driven discovery. △ Less

Submitted 9 July, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

Comments: 17 pages, 5 figures; v2: 12 pages, 6 figures. Accepted to Nature Astronomy. See also the Behind the Paper blog in Nature Astronomy "https://astronomycommunity.nature.com/posts/from-disruption-to-sustained-innovation-artificial-intelligence-for-gravitational-wave-astrophysics"

MSC Class: 68T01; 68T35; 83C35; 83C57

Journal ref: Nat Astron 5, 1062-1068 (2021)

arXiv:2012.05352 [pdf]

Electric Vehicle Battery Remaining Charging Time Estimation Considering Charging Accuracy and Charging Profile Prediction

Authors: Junzhe Shi, Min Tian, Sangwoo Han, Tung-Yan Wu, Yifan Tang

Abstract: Electric vehicles (EVs) have been growing rapidly in popularity in recent years and have become a future trend. It is an important aspect of user experience to know the Remaining Charging Time (RCT) of an EV with confidence. However, it is difficult to find an algorithm that accurately estimates the RCT for vehicles in the current EV market. The maximum RCT estimation error of the Tesla Model X ca… ▽ More Electric vehicles (EVs) have been growing rapidly in popularity in recent years and have become a future trend. It is an important aspect of user experience to know the Remaining Charging Time (RCT) of an EV with confidence. However, it is difficult to find an algorithm that accurately estimates the RCT for vehicles in the current EV market. The maximum RCT estimation error of the Tesla Model X can be as high as 60 minutes from a 10 % to 99 % state-of-charge (SOC) while charging at direct current (DC). A highly accurate RCT estimation algorithm for electric vehicles is in high demand and will continue to be as EVs become more popular. There are currently two challenges to arriving at an accurate RCT estimate. First, most commercial chargers cannot provide requested charging currents during a constant current (CC) stage. Second, it is hard to predict the charging current profile in a constant voltage (CV) stage. To address the first issue, this study proposes an RCT algorithm that updates the charging accuracy online in the CC stage by considering the confidence interval between the historical charging accuracy and real-time charging accuracy data. To solve the second issue, this study proposes a battery resistance prediction model to predict charging current profiles in the CV stage, using a Radial Basis Function (RBF) neural network (NN). The test results demonstrate that the RCT algorithm proposed in this study achieves an error rate improvement of 73.6 % and 84.4 % over the traditional method in the CC and CV stages, respectively. △ Less

Submitted 9 December, 2020; originally announced December 2020.

arXiv:2011.02683 [pdf, other]

Nonparametric Variable Screening with Optimal Decision Stumps

Authors: Jason M. Klusowski, Peter M. Tian

Abstract: Decision trees and their ensembles are endowed with a rich set of diagnostic tools for ranking and screening variables in a predictive model. Despite the widespread use of tree based variable importance measures, pinning down their theoretical properties has been challenging and therefore largely unexplored. To address this gap between theory and practice, we derive finite sample performance guara… ▽ More Decision trees and their ensembles are endowed with a rich set of diagnostic tools for ranking and screening variables in a predictive model. Despite the widespread use of tree based variable importance measures, pinning down their theoretical properties has been challenging and therefore largely unexplored. To address this gap between theory and practice, we derive finite sample performance guarantees for variable selection in nonparametric models using a single-level CART decision tree (a decision stump). Under standard operating assumptions in variable screening literature, we find that the marginal signal strength of each variable and ambient dimensionality can be considerably weaker and higher, respectively, than state-of-the-art nonparametric variable selection methods. Furthermore, unlike previous marginal screening methods that attempt to directly estimate each marginal projection via a truncated basis expansion, the fitted model used here is a simple, parsimonious decision stump, thereby eliminating the need for tuning the number of basis terms. Thus, surprisingly, even though decision stumps are highly inaccurate for estimation purposes, they can still be used to perform consistent model selection. △ Less

Submitted 10 December, 2020; v1 submitted 5 November, 2020; originally announced November 2020.

arXiv:2011.00569 [pdf, other]

DeepOpht: Medical Report Generation for Retinal Images via Deep Models and Visual Explanation

Authors: Jia-Hong Huang, Chao-Han Huck Yang, Fangyu Liu, Meng Tian, Yi-Chieh Liu, Ting-Wei Wu, I-Hung Lin, Kang Wang, Hiromasa Morikawa, Hernghua Chang, Jesper Tegner, Marcel Worring

Abstract: In this work, we propose an AI-based method that intends to improve the conventional retinal disease treatment procedure and help ophthalmologists increase diagnosis efficiency and accuracy. The proposed method is composed of a deep neural networks-based (DNN-based) module, including a retinal disease identifier and clinical description generator, and a DNN visual explanation module. To train and… ▽ More In this work, we propose an AI-based method that intends to improve the conventional retinal disease treatment procedure and help ophthalmologists increase diagnosis efficiency and accuracy. The proposed method is composed of a deep neural networks-based (DNN-based) module, including a retinal disease identifier and clinical description generator, and a DNN visual explanation module. To train and validate the effectiveness of our DNN-based module, we propose a large-scale retinal disease image dataset. Also, as ground truth, we provide a retinal image dataset manually labeled by ophthalmologists to qualitatively show, the proposed AI-based method is effective. With our experimental results, we show that the proposed method is quantitatively and qualitatively effective. Our method is capable of creating meaningful retinal image descriptions and visual explanations that are clinically relevant. △ Less

Submitted 1 November, 2020; originally announced November 2020.

Comments: Accepted to IEEE WACV 2021

arXiv:2007.09798 [pdf, other]

Counterfactual Learning to Rank using Heterogeneous Treatment Effect Estimation

Authors: Mucun Tian, Chun Guo, Vito Ostuni, Zhen Zhu

Abstract: Learning-to-Rank (LTR) models trained from implicit feedback (e.g. clicks) suffer from inherent biases. A well-known one is the position bias -- documents in top positions are more likely to receive clicks due in part to their position advantages. To unbiasedly learn to rank, existing counterfactual frameworks first estimate the propensity (probability) of missing clicks with intervention data fro… ▽ More Learning-to-Rank (LTR) models trained from implicit feedback (e.g. clicks) suffer from inherent biases. A well-known one is the position bias -- documents in top positions are more likely to receive clicks due in part to their position advantages. To unbiasedly learn to rank, existing counterfactual frameworks first estimate the propensity (probability) of missing clicks with intervention data from a small portion of search traffic, and then use inverse propensity score (IPS) to debias LTR algorithms on the whole data set. These approaches often assume the propensity only depends on the position of the document, which may cause high estimation variance in applications where the search context (e.g. query, user) varies frequently. While context-dependent propensity models reduce variance, accurate estimations may require randomization or intervention on a large amount of traffic, which may not be realistic in real-world systems, especially for long tail queries. In this work, we employ heterogeneous treatment effect estimation techniques to estimate position bias when intervention click data is limited. We then use such estimations to debias the observed click distribution and re-draw a new de-biased data set, which can be used for any LTR algorithms. We conduct simulations with varying experiment conditions and show the effectiveness of the proposed method in regimes with long tail queries and sparse clicks. △ Less

Submitted 19 July, 2020; originally announced July 2020.

Comments: 9 pages; to be published in SIGIR eCom'20

ACM Class: H.3.3

Showing 1–50 of 77 results for author: Tian, M