Skip to main content

Showing 1–50 of 118 results for author: Yang, C

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.20838  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    einspace: Searching for Neural Architectures from Fundamental Operations

    Authors: Linus Ericsson, Miguel Espinosa, Chenhongyi Yang, Antreas Antoniou, Amos Storkey, Shay B. Cohen, Steven McDonagh, Elliot J. Crowley

    Abstract: Neural architecture search (NAS) finds high performing networks for a given task. Yet the results of NAS are fairly prosaic; they did not e.g. create a shift from convolutional structures to transformers. This is not least because the search spaces in NAS often aren't diverse enough to include such transformations a priori. Instead, for NAS to provide greater potential for fundamental design shift… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Project page at https://linusericsson.github.io/einspace/

  2. arXiv:2405.20550  [pdf

    cs.LG stat.ML

    Uncertainty Quantification for Deep Learning

    Authors: Peter Jan van Leeuwen, J. Christine Chiu, C. Kevin Yang

    Abstract: A complete and statistically consistent uncertainty quantification for deep learning is provided, including the sources of uncertainty arising from (1) the new input data, (2) the training and testing data (3) the weight vectors of the neural network, and (4) the neural network because it is not a perfect predictor. Using Bayes Theorem and conditional probability densities, we demonstrate how each… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 25 pages 4 figures, submitted to Environmental data Science

    MSC Class: 62D99 ACM Class: G.3

  3. arXiv:2405.02539  [pdf, ps, other

    stat.ME

    Distributed Iterative Hard Thresholding for Variable Selection in Tobit Models

    Authors: Changxin Yang, Zhongyi Zhu, Heng Lian

    Abstract: While extensive research has been conducted on high-dimensional data and on regression with left-censored responses, simultaneously addressing these complexities remains challenging, with only a few proposed methods available. In this paper, we utilize the Iterative Hard Thresholding (IHT) algorithm on the Tobit model in such a setting. Theoretical analysis demonstrates that our estimator converge… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  4. arXiv:2404.09729  [pdf

    eess.SP cs.IT cs.LG stat.ME

    Amplitude-Phase Fusion for Enhanced Electrocardiogram Morphological Analysis

    Authors: Shuaicong Hu, Yanan Wang, Jian Liu, **gyu Lin, Shengmei Qin, Zhenning Nie, Zhifeng Yao, Wenjie Cai, Cuiwei Yang

    Abstract: Considering the variability of amplitude and phase patterns in electrocardiogram (ECG) signals due to cardiac activity and individual differences, existing entropy-based studies have not fully utilized these two patterns and lack integration. To address this gap, this paper proposes a novel fusion entropy metric, morphological ECG entropy (MEE) for the first time, specifically designed for ECG mor… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 16 pages, 12 figures

    ACM Class: I.5.2

  5. arXiv:2404.02120  [pdf, other

    stat.AP stat.ME

    DEMO: Dose Exploration, Monitoring, and Optimization Using a Biological Mediator for Clinical Outcomes

    Authors: Cheng-Han Yang, Peter F. Thall, Ruitao Lin

    Abstract: Phase 1-2 designs provide a methodological advance over phase 1 designs for dose finding by using both clinical response and toxicity. A phase 1-2 trial still may fail to select a truly optimal dose. because early response is not a perfect surrogate for long term therapeutic success. To address this problem, a generalized phase 1-2 design first uses a phase 1-2 design's components to identify a se… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  6. arXiv:2403.11960  [pdf, other

    cs.LG stat.ML

    CASPER: Causality-Aware Spatiotemporal Graph Neural Networks for Spatiotemporal Time Series Imputation

    Authors: Baoyu **g, Dawei Zhou, Kan Ren, Carl Yang

    Abstract: Spatiotemporal time series is the foundation of understanding human activities and their impacts, which is usually collected via monitoring sensors placed at different locations. The collected data usually contains missing values due to various failures, which have significant impact on data analysis. To impute the missing values, a lot of methods have been introduced. When recovering a specific d… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Preprint. Work in progress

  7. arXiv:2403.10424  [pdf, other

    cs.LG stat.ML

    Structured Evaluation of Synthetic Tabular Data

    Authors: Scott Cheng-Hsin Yang, Baxter Eaves, Michael Schmidt, Ken Swanson, Patrick Shafto

    Abstract: Tabular data is common yet typically incomplete, small in volume, and access-restricted due to privacy concerns. Synthetic data generation offers potential solutions. Many metrics exist for evaluating the quality of synthetic tabular data; however, we lack an objective, coherent interpretation of the many metrics. To address this issue, we propose an evaluation framework with a single, mathematica… ▽ More

    Submitted 29 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  8. arXiv:2402.13187  [pdf, other

    cs.LG cs.DS stat.CO stat.ML

    Testing Calibration in Nearly-Linear Time

    Authors: Lunjia Hu, Arun Jambulapati, Kevin Tian, Chutong Yang

    Abstract: In the recent literature on machine learning and decision making, calibration has emerged as a desirable and widely-studied statistical property of the outputs of binary prediction models. However, the algorithmic aspects of measuring model calibration have remained relatively less well-explored. Motivated by [BGHN23], which proposed a rigorous framework for measuring distances to calibration, we… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  9. arXiv:2402.11283  [pdf, other

    math.NA stat.ML

    Deep adaptive sampling for surrogate modeling without labeled data

    Authors: Xili Wang, Kejun Tang, Jiayu Zhai, Xiaoliang Wan, Chao Yang

    Abstract: Surrogate modeling is of great practical significance for parametric differential equation systems. In contrast to classical numerical methods, using physics-informed deep learning methods to construct simulators for such systems is a promising direction due to its potential to handle high dimensionality, which requires minimizing a loss over a training set of random samples. However, the random s… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  10. arXiv:2311.14652  [pdf, other

    cs.LG cs.CL stat.ML

    One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

    Authors: Raghav Addanki, Chenyang Li, Zhao Song, Chiwun Yang

    Abstract: Attention computation takes both the time complexity of $O(n^2)$ and the space complexity of $O(n^2)$ simultaneously, which makes deploying Large Language Models (LLMs) in streaming applications that involve long contexts requiring substantial computational resources. In recent OpenAI DevDay (Nov 6, 2023), OpenAI released a new model that is able to support a 128K-long document, in our paper, we f… ▽ More

    Submitted 5 February, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  11. arXiv:2310.12462  [pdf, other

    cs.LG cs.CL stat.ML

    Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

    Authors: Yichuan Deng, Zhao Song, Shenghao Xie, Chiwun Yang

    Abstract: In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks. However, with their widespread adoption, concerns regarding the security and privacy of the data processed by these models have arisen. In this paper, we address a pivotal question: Can the data fed into transformers be recovered using their attention weights and… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  12. arXiv:2309.16240  [pdf, other

    cs.LG cs.AI stat.ML

    Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints

    Authors: Chaoqi Wang, Yibo Jiang, Chenghao Yang, Han Liu, Yuxin Chen

    Abstract: The increasing capabilities of large language models (LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment. Reinforcement Learning from Human Feedback (RLHF) has emerged as a promising pathway towards AI alignment but brings forth challenges due to its complexity and depe… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Preprint

  13. arXiv:2309.08642  [pdf, other

    eess.SY cs.AI cs.LG stat.ME

    A Stochastic Online Forecast-and-Optimize Framework for Real-Time Energy Dispatch in Virtual Power Plants under Uncertainty

    Authors: Wei Jiang, Zhongkai Yi, Li Wang, Hanwei Zhang, Jihai Zhang, Fangquan Lin, Cheng Yang

    Abstract: Aggregating distributed energy resources in power systems significantly increases uncertainties, in particular caused by the fluctuation of renewable energy generation. This issue has driven the necessity of widely exploiting advanced predictive control techniques under uncertainty to ensure long-term economics and decarbonization. In this paper, we propose a real-time uncertainty-aware energy dis… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: Preprint. Accepted by CIKM 23

  14. arXiv:2308.11138  [pdf, ps, other

    stat.ME cs.CL q-fin.RM stat.ML

    NLP-based detection of systematic anomalies among the narratives of consumer complaints

    Authors: Peiheng Gao, Ning Sun, Xuefeng Wang, Chen Yang, Ričardas Zitikis

    Abstract: We develop an NLP-based procedure for detecting systematic nonmeritorious consumer complaints, simply called systematic anomalies, among complaint narratives. While classification algorithms are used to detect pronounced anomalies, in the case of smaller and frequent systematic anomalies, the algorithms may falter due to a variety of reasons, including technical ones as well as natural limitations… ▽ More

    Submitted 26 March, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

  15. arXiv:2308.06106  [pdf, other

    cs.LG stat.ML

    Hawkes Processes with Delayed Granger Causality

    Authors: Chao Yang, Hengyuan Miao, Shuang Li

    Abstract: We aim to explicitly model the delayed Granger causal effects based on multivariate Hawkes processes. The idea is inspired by the fact that a causal event usually takes some time to exert an effect. Studying this time lag itself is of interest. Given the proposed model, we first prove the identifiability of the delay parameter under mild conditions. We further investigate a model estimation method… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: 19 pages

  16. arXiv:2306.17584  [pdf, other

    stat.ME stat.AP

    Flexible and Accurate Methods for Estimation and Inference of Gaussian Graphical Models with Applications

    Authors: Yueqi Qian, Xianghong Hu, Can Yang

    Abstract: The Gaussian graphical model (GGM) incorporates an undirected graph to represent the conditional dependence between variables, with the precision matrix encoding partial correlation between pair of variables given the others. To achieve flexible and accurate estimation and inference of GGM, we propose the novel method FLAG, which utilizes the random effects model for pairwise conditional regressio… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

  17. arXiv:2305.18702  [pdf, other

    stat.ML cs.LG math.NA

    Adversarial Adaptive Sampling: Unify PINN and Optimal Transport for the Approximation of PDEs

    Authors: Kejun Tang, Jiayu Zhai, Xiaoliang Wan, Chao Yang

    Abstract: Solving partial differential equations (PDEs) is a central task in scientific computing. Recently, neural network approximation of PDEs has received increasing attention due to its flexible meshless discretization and its potential for high-dimensional problems. One fundamental numerical difficulty is that random samples in the training set introduce statistical errors into the discretization of l… ▽ More

    Submitted 14 March, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: ICLR, 2024

  18. arXiv:2305.14612  [pdf

    cs.CV stat.AP

    Assessment of Anterior Cruciate Ligament Injury Risk Based on Human Key Points Detection Algorithm

    Authors: Ziyu Gong, Xiong Zhao, Chen Yang

    Abstract: This paper aims to detect the potential injury risk of the anterior cruciate ligament (ACL) by proposing an ACL potential injury risk assessment algorithm based on key points of the human body detected using computer vision technology. To obtain the key points data of the human body in each frame, OpenPose, an open source computer vision algorithm, was employed. The obtained data underwent preproc… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 17 pages,and 6 figures

  19. ggpicrust2: an R package for PICRUSt2 predicted functional profile analysis and visualization

    Authors: Chen Yang, Jiahao Mai, Xuan Cao, Aaron Burberry, Fabio Cominelli, Liangliang Zhang

    Abstract: Microbiome research is now moving beyond the compositional analysis of microbial taxa in a sample. Increasing evidence from large human microbiome studies suggests that functional consequences of changes in the intestinal microbiome may provide more power for studying their impact on inflammation and immune responses. Although 16S rRNA analysis is one of the most popular and a cost-effective metho… ▽ More

    Submitted 9 April, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

    Comments: 4 pages, 1 figure

  20. arXiv:2303.05341  [pdf, other

    stat.ML cs.LG eess.IV

    Penalized Deep Partially Linear Cox Models with Application to CT Scans of Lung Cancer Patients

    Authors: Yuming Sun, Jian Kang, Chinmay Haridas, Nicholas R. Mayne, Alexandra L. Potter, Chi-Fu Jeffrey Yang, David C. Christiani, Yi Li

    Abstract: Lung cancer is a leading cause of cancer mortality globally, highlighting the importance of understanding its mortality risks to design effective patient-centered therapies. The National Lung Screening Trial (NLST) employed computed tomography texture analysis, which provides objective measurements of texture patterns on CT scans, to quantify the mortality risks of lung cancer patients. Partially… ▽ More

    Submitted 29 September, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  21. arXiv:2303.02566  [pdf, other

    stat.ML cs.LG stat.CO

    MFAI: A Scalable Bayesian Matrix Factorization Approach to Leveraging Auxiliary Information

    Authors: Zhiwei Wang, Fa Zhang, Cong Zheng, Xianghong Hu, Mingxuan Cai, Can Yang

    Abstract: In various practical situations, matrix factorization methods suffer from poor data quality, such as high data sparsity and low signal-to-noise ratio (SNR). Here, we consider a matrix factorization problem by utilizing auxiliary information, which is massively available in real-world applications, to overcome the challenges caused by poor data quality. Unlike existing methods that mainly rely on s… ▽ More

    Submitted 12 February, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

  22. arXiv:2302.13231  [pdf

    eess.SY stat.AP

    A Synthetic Texas Backbone Power System with Climate-Dependent Spatio-Temporal Correlated Profiles

    Authors: ** Lu, Xingpeng Li, Hongyi Li, Taher Chegini, Carlos Gamarra, Y. C. Ethan Yang, Margaret Cook, Gavin Dillingham

    Abstract: Most power system test cases only have electrical parameters and can be used only for studies based on a snapshot of system profiles. To facilitate more comprehensive and practical studies, a synthetic power system including spatio-temporal correlated profiles for the entire year of 2019 at one-hour resolution has been created in this work. This system, referred to as the synthetic Texas 123-bus b… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

    Comments: 10 pages, 14 figures, 12 tables

  23. arXiv:2302.06807  [pdf, other

    stat.ML cs.LG

    Horospherical Decision Boundaries for Large Margin Classification in Hyperbolic Space

    Authors: Xiran Fan, Chun-Hao Yang, Baba C. Vemuri

    Abstract: Hyperbolic spaces have been quite popular in the recent past for representing hierarchically organized data. Further, several classification algorithms for data in these spaces have been proposed in the literature. These algorithms mainly use either hyperplanes or geodesics for decision boundaries in a large margin classifiers setting leading to a non-convex optimization problem. In this paper, we… ▽ More

    Submitted 28 September, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: To appear at Neural Information Processing Systems (NeurIPS) 2023

  24. arXiv:2211.08881  [pdf

    stat.AP

    A foretaste of Qatar 2022: decreased playing time of internationals after the Africa Cup of Nations

    Authors: Otto Kolbinger, Chenyuyan Yang, Martin Lames

    Abstract: Due to the unfavourable climatic conditions in Qatar during summertime, the FIFA World Cup 2022 will be played during on-going seasons of the major European leagues. This study investigates how national teams' tournaments scheduled at such a time window impact the playing time of released players, using data from the Africa Cups of Nations (AFCON). For 262 internationals playing at the 2013, 2015… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: 15 pages, 2 figures

  25. arXiv:2207.13248  [pdf, ps, other

    stat.ME stat.AP

    Tail maximal dependence in bivariate models: estimation and applications

    Authors: Ning Sun, Chen Yang, Ričardas Zitikis

    Abstract: Assessing dependence within co-movements of financial instruments has been of much interest in risk management. Typically, indices of tail dependence are used to quantify the strength of such dependence, although many of the indices underestimate the strength. Hence, we advocate the use of a statistical procedure designed to estimate the maximal strength of dependence that can possibly occur among… ▽ More

    Submitted 19 September, 2022; v1 submitted 26 July, 2022; originally announced July 2022.

    MSC Class: 62P05

  26. arXiv:2204.07615  [pdf, other

    cs.LG stat.ML

    TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets

    Authors: Chengrun Yang, Gabriel Bender, Hanxiao Liu, Pieter-Jan Kindermans, Madeleine Udell, Yifeng Lu, Quoc Le, Da Huang

    Abstract: The best neural architecture for a given machine learning problem depends on many factors: not only the complexity and structure of the dataset, but also on resource constraints including latency, compute, energy consumption, etc. Neural architecture search (NAS) for tabular datasets is an important but under-explored problem. Previous NAS algorithms designed for image search spaces incorporate re… ▽ More

    Submitted 20 October, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: NeurIPS 2022; 30 pages, 15 figures, 7 tables

  27. arXiv:2203.15804  [pdf

    cs.LG q-bio.QM stat.AP

    Improving The Diagnosis of Thyroid Cancer by Machine Learning and Clinical Data

    Authors: Nan Miles Xi, Lin Wang, Chuanjia Yang

    Abstract: Thyroid cancer is a common endocrine carcinoma that occurs in the thyroid gland. Much effort has been invested in improving its diagnosis, and thyroidectomy remains the primary treatment method. A successful operation without unnecessary side injuries relies on an accurate preoperative diagnosis. Current human assessment of thyroid nodule malignancy is prone to errors and may not guarantee an accu… ▽ More

    Submitted 27 March, 2022; originally announced March 2022.

  28. arXiv:2201.09433  [pdf, other

    cs.LG cs.CC stat.ML

    Active Learning Polynomial Threshold Functions

    Authors: Omri Ben-Eliezer, Max Hopkins, Chutong Yang, Hantao Yu

    Abstract: We initiate the study of active learning polynomial threshold functions (PTFs). While traditional lower bounds imply that even univariate quadratics cannot be non-trivially actively learned, we show that allowing the learner basic access to the derivatives of the underlying classifier circumvents this issue and leads to a computationally efficient algorithm for active learning degree-$d$ univariat… ▽ More

    Submitted 1 October, 2022; v1 submitted 23 January, 2022; originally announced January 2022.

    MSC Class: 68Q32

  29. arXiv:2112.14038  [pdf, other

    math.NA stat.ML

    DAS-PINNs: A deep adaptive sampling method for solving high-dimensional partial differential equations

    Authors: Kejun Tang, Xiaoliang Wan, Chao Yang

    Abstract: In this work we propose a deep adaptive sampling (DAS) method for solving partial differential equations (PDEs), where deep neural networks are utilized to approximate the solutions of PDEs and deep generative models are employed to generate new collocation points that refine the training set. The overall procedure of DAS consists of two components: solving the PDEs by minimizing the residual loss… ▽ More

    Submitted 5 July, 2022; v1 submitted 28 December, 2021; originally announced December 2021.

  30. arXiv:2112.03402  [pdf, other

    cs.LG cs.AI stat.ML

    Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design

    Authors: Xiran Fan, Chun-Hao Yang, Baba C. Vemuri

    Abstract: Hyperbolic neural networks have been popular in the recent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in develo** these networks lies in the nonlinearity of the embedding space namely, the Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the Lorentz group. Most existing methods (with some exceptions) use loca… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: 19 pages, 6 figures

  31. arXiv:2111.13378  [pdf, other

    stat.ME stat.AP

    Differentially Private Methods for Releasing Results of Stability Analyses

    Authors: Chengxin Yang, Jerome P. Reiter

    Abstract: Data stewards and analysts can promote transparent and trustworthy science and policy-making by facilitating assessments of the sensitivity of published results to alternate analysis choices. For example, researchers may want to assess whether the results change substantially when different subsets of data points (e.g., sets formed by demographic characteristics) are used in the analysis, or when… ▽ More

    Submitted 23 August, 2023; v1 submitted 26 November, 2021; originally announced November 2021.

    Comments: 30 pages, 3 figures

  32. arXiv:2109.14236  [pdf, other

    cs.LG cs.CR cs.DC cs.IT stat.ML

    LightSecAgg: a Lightweight and Versatile Design for Secure Aggregation in Federated Learning

    Authors: **hyun So, Chaoyang He, Chien-Sheng Yang, Songze Li, Qian Yu, Ramy E. Ali, Basak Guler, Salman Avestimehr

    Abstract: Secure model aggregation is a key component of federated learning (FL) that aims at protecting the privacy of each user's individual model while allowing for their global aggregation. It can be applied to any aggregation-based FL approach for training a global or personalized model. Model aggregation needs to also be resilient against likely user dropouts in FL systems, making its design substanti… ▽ More

    Submitted 1 February, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

    Comments: This paper is accepted to the 5th MLSys Conference, Santa Clara, CA, USA, 2022

  33. arXiv:2106.15743  [pdf, other

    stat.ME math.ST

    BONuS: Multiple multivariate testing with a data-adaptivetest statistic

    Authors: Chiao-Yu Yang, Lihua Lei, Nhat Ho, Will Fithian

    Abstract: We propose a new adaptive empirical Bayes framework, the Bag-Of-Null-Statistics (BONuS) procedure, for multiple testing where each hypothesis testing problem is itself multivariate or nonparametric. BONuS is an adaptive and interactive knockoff-type method that helps improve the testing power while controlling the false discovery rate (FDR), and is closely connected to the "counting knockoffs" pro… ▽ More

    Submitted 1 July, 2021; v1 submitted 29 June, 2021; originally announced June 2021.

  34. arXiv:2106.13423  [pdf, other

    cs.LG cs.AI cs.DC stat.ML

    Federated Graph Classification over Non-IID Graphs

    Authors: Han Xie, **g Ma, Li Xiong, Carl Yang

    Abstract: Federated learning has emerged as an important paradigm for training machine learning models in different domains. For graph-level tasks such as graph classification, graphs can also be regarded as a special type of data samples, which can be collected and stored in separate local systems. Similar to other domains, multiple local systems, each holding a small set of graphs, may benefit from collab… ▽ More

    Submitted 7 November, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

    Comments: Accepted to NeurIPS 2021

  35. arXiv:2106.08171  [pdf, other

    cs.LG stat.ML

    Evaluating Modules in Graph Contrastive Learning

    Authors: Ganqu Cui, Yufeng Du, Cheng Yang, Jie Zhou, Liang Xu, Xing Zhou, Xingyi Cheng, Zhiyuan Liu

    Abstract: The recent emergence of contrastive learning approaches facilitates the application on graph representation learning (GRL), introducing graph contrastive learning (GCL) into the literature. These methods contrast semantically similar and dissimilar sample pairs to encode the semantics into node or graph embeddings. However, most existing works only performed \textbf{model-level} evaluation, and di… ▽ More

    Submitted 2 June, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

  36. arXiv:2105.09670  [pdf, other

    stat.ML cs.LG

    Ensemble machine learning approach for screening of coronary heart disease based on echocardiography and risk factors

    Authors: **gyi Zhang, Huolan Zhu, Yongkai Chen, Chenguang Yang, Huimin Cheng, Yi Li, Wenxuan Zhong, Fang Wang

    Abstract: Background: Extensive clinical evidence suggests that a preventive screening of coronary heart disease (CHD) at an earlier stage can greatly reduce the mortality rate. We use 64 two-dimensional speckle tracking echocardiography (2D-STE) features and seven clinical features to predict whether one has CHD. Methods: We develop a machine learning approach that integrates a number of popular classifica… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

    Comments: 30 pages, 5 figures, 5 tables

  37. arXiv:2101.00323  [pdf, other

    stat.ML cs.LG math.OC

    TenIPS: Inverse Propensity Sampling for Tensor Completion

    Authors: Chengrun Yang, Lijun Ding, Ziyang Wu, Madeleine Udell

    Abstract: Tensors are widely used to represent multiway arrays of data. The recovery of missing entries in a tensor has been extensively studied, generally under the assumption that entries are missing completely at random (MCAR). However, in most practical settings, observations are missing not at random (MNAR): the probability that a given entry is observed (also called the propensity) may depend on other… ▽ More

    Submitted 22 April, 2021; v1 submitted 1 January, 2021; originally announced January 2021.

    Comments: AISTATS 2021

  38. arXiv:2011.10195  [pdf, ps, other

    stat.ME

    Detecting systematic anomalies affecting systems when inputs are stationary time series

    Authors: Ning Sun, Chen Yang, Ričardas Zitikis

    Abstract: We develop an anomaly-detection method when systematic anomalies, possibly statistically very similar to genuine inputs, are affecting control systems at the input and/or output stages. The method allows anomaly-free inputs (i.e., those before contamination) to originate from a wide class of random sequences, thus opening up possibilities for diverse applications. To illustrate how the method work… ▽ More

    Submitted 31 January, 2022; v1 submitted 19 November, 2020; originally announced November 2020.

    MSC Class: 94A12; 94A13; 62P30; 62P35

  39. arXiv:2010.14589  [pdf, other

    cs.CV stat.ME

    Nested Grassmannians for Dimensionality Reduction with Applications

    Authors: Chun-Hao Yang, Baba C. Vemuri

    Abstract: In the recent past, nested structures in Riemannian manifolds has been studied in the context of dimensionality reduction as an alternative to the popular principal geodesic analysis (PGA) technique, for example, the principal nested spheres. In this paper, we propose a novel framework for constructing a nested sequence of homogeneous Riemannian manifolds. Common examples of homogeneous Riemannian… ▽ More

    Submitted 1 March, 2022; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: 21 pages, 9 figures. Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://www.melba-journal.org

  40. arXiv:2010.12636  [pdf, ps, other

    cs.LG cs.CE stat.ML

    Nonseparable Symplectic Neural Networks

    Authors: Shiying Xiong, Yun** Tong, Xingzhe He, Shuqi Yang, Cheng Yang, Bo Zhu

    Abstract: Predicting the behaviors of Hamiltonian systems has been drawing increasing attention in scientific machine learning. However, the vast majority of the literature was focused on predicting separable Hamiltonian systems with their kinematic and potential energy terms being explicitly decoupled while building data-driven paradigms to predict nonseparable Hamiltonian systems that are ubiquitous in fl… ▽ More

    Submitted 19 February, 2022; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: ICLR2021

  41. arXiv:2009.05204  [pdf, other

    cs.LG stat.ML

    Transfer Learning of Graph Neural Networks with Ego-graph Information Maximization

    Authors: Qi Zhu, Carl Yang, Yidan Xu, Haonan Wang, Chao Zhang, Jiawei Han

    Abstract: Graph neural networks (GNNs) have achieved superior performance in various applications, but training dedicated GNNs can be costly for large-scale graphs. Some recent work started to study the pre-training of GNNs. However, none of them provide theoretical insights into the design of their frameworks, or clear requirements and guarantees towards their transferability. In this work, we establish a… ▽ More

    Submitted 26 October, 2021; v1 submitted 10 September, 2020; originally announced September 2020.

    Comments: NeurIPS 2021

  42. Adaptive Graph Encoder for Attributed Graph Embedding

    Authors: Ganqu Cui, Jie Zhou, Cheng Yang, Zhiyuan Liu

    Abstract: Attributed graph embedding, which learns vector representations from graph topology and node features, is a challenging task for graph analysis. Recently, methods based on graph convolutional networks (GCNs) have made great progress on this task. However,existing GCN-based methods have three major drawbacks. Firstly,our experiments indicate that the entanglement of graph convolutional filters and… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

    Comments: To appear in KDD 2020

  43. arXiv:2006.08816  [pdf, other

    cs.LG stat.ML

    Signed Graph Metric Learning via Gershgorin Disc Perfect Alignment

    Authors: Cheng Yang, Gene Cheung, Wei Hu

    Abstract: Given a convex and differentiable objective $Q(\M)$ for a real symmetric matrix $\M$ in the positive definite (PD) cone -- used to compute Mahalanobis distances -- we propose a fast general metric learning framework that is entirely projection-free. We first assume that $\M$ resides in a space $\cS$ of generalized graph Laplacian matrices corresponding to balanced signed graphs. $\M \in \cS$ that… ▽ More

    Submitted 10 June, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: code available: https://github.com/bobchengyang/SGML

  44. arXiv:2006.06983  [pdf, other

    cs.LG cs.DC stat.ML

    Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

    Authors: Chengxu Yang, Qipeng Wang, Mengwei Xu, Zhenpeng Chen, Kaigui Bian, Yunxin Liu, Xuanzhe Liu

    Abstract: Federated learning (FL) is an emerging, privacy-preserving machine learning paradigm, drawing tremendous attention in both academia and industry. A unique characteristic of FL is heterogeneity, which resides in the various hardware specifications and dynamic states across the participating devices. Theoretically, heterogeneity can exert a huge influence on the FL training process, e.g., causing a… ▽ More

    Submitted 12 March, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

  45. arXiv:2006.04239  [pdf, other

    cs.LG cs.IR stat.ML

    Unsupervised Differentiable Multi-aspect Network Embedding

    Authors: Chanyoung Park, Carl Yang, Qi Zhu, Donghyun Kim, Hwanjo Yu, Jiawei Han

    Abstract: Network embedding is an influential graph mining technique for representing nodes in a graph as distributed vectors. However, the majority of network embedding methods focus on learning a single vector representation for each node, which has been recently criticized for not being capable of modeling multiple aspects of a node. To capture the multiple aspects of each node, existing studies mainly r… ▽ More

    Submitted 7 July, 2020; v1 submitted 7 June, 2020; originally announced June 2020.

    Comments: KDD 2020 (Research Track). 9 Pages + Appendix (2 Pages). Source code can be found https://github.com/pcy1302/asp2vec. Typo fixed in Fig.2

  46. arXiv:2006.04216  [pdf, other

    cs.LG cs.AI stat.ML

    Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

    Authors: Chengrun Yang, Jicong Fan, Ziyang Wu, Madeleine Udell

    Abstract: Data scientists seeking a good supervised learning model on a new dataset have many choices to make: they must preprocess the data, select features, possibly reduce the dimension, select an estimation algorithm, and choose hyperparameters for each of these pipeline components. With new pipeline components comes a combinatorial explosion in the number of choices! In this work, we design a new AutoM… ▽ More

    Submitted 7 June, 2020; originally announced June 2020.

    Comments: This is an extended version of AutoML Pipeline Selection: Efficiently Navigating the Combinatorial Space (DOI: 10.1145/3394486.3403197) at KDD 2020

  47. arXiv:2005.13974  [pdf

    q-fin.ST cs.CE stat.OT

    On the Bound of Cumulative Return in Trading Series and the Verification Using Technical Trading Rules

    Authors: Can Yang, Junjie Zhai, Helong Li

    Abstract: Although there is a wide use of technical trading rules in stock markets, the profitability of them still remains controversial. This paper first presents and proves the upper bound of cumulative return, and then introduces many of conventional technical trading rules. Furthermore, with the help of bootstrap methodology, we investigate the profitability of technical trading rules on different inte… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

  48. arXiv:2005.04843  [pdf, other

    cs.LG cs.SI stat.ML

    Semi-supervised Hypergraph Node Classification on Hypergraph Line Expansion

    Authors: Chaoqi Yang, Ruijie Wang, Shuochao Yao, Tarek Abdelzaher

    Abstract: Previous hypergraph expansions are solely carried out on either vertex level or hyperedge level, thereby missing the symmetric nature of data co-occurrence, and resulting in information loss. To address the problem, this paper treats vertices and hyperedges equally and proposes a new hypergraph formulation named the \emph{line expansion (LE)} for hypergraphs learning. The new expansion bijectively… ▽ More

    Submitted 13 April, 2023; v1 submitted 10 May, 2020; originally announced May 2020.

    Comments: CIKM 2022, GitHub: https://github.com/ycq091044/LEGCN

  49. Robust Non-Linear Matrix Factorization for Dictionary Learning, Denoising, and Clustering

    Authors: Jicong Fan, Chengrun Yang, Madeleine Udell

    Abstract: Low dimensional nonlinear structure abounds in datasets across computer vision and machine learning. Kernelized matrix factorization techniques have recently been proposed to learn these nonlinear structures for denoising, classification, dictionary learning, and missing data imputation, by observing that the image of the matrix in a sufficiently large feature space is low-rank. However, these non… ▽ More

    Submitted 2 December, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

    Journal ref: IEEE Transactions on Signal Processing 69, 1755-1770 (2021)

  50. arXiv:2005.00072  [pdf, other

    econ.EM cs.LG stat.AP

    Two Burning Questions on COVID-19: Did shutting down the economy help? Can we (partially) reopen the economy without risking the second wave?

    Authors: Anish Agarwal, Abdullah Alomar, Arnab Sarker, Devavrat Shah, Dennis Shen, Cindy Yang

    Abstract: As we reach the apex of the COVID-19 pandemic, the most pressing question facing us is: can we even partially reopen the economy without risking a second wave? We first need to understand if shutting down the economy helped. And if it did, is it possible to achieve similar gains in the war against the pandemic while partially opening up the economy? To do so, it is critical to understand the effec… ▽ More

    Submitted 10 May, 2020; v1 submitted 30 April, 2020; originally announced May 2020.