Skip to main content

Showing 1–46 of 46 results for author: Song, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.19058  [pdf, other

    stat.ME

    Participation bias in the estimation of heritability and genetic correlation

    Authors: Shuang Song, Stefania Benonisdottir, Jun S. Liu, Augustine Kong

    Abstract: It is increasingly recognized that participation bias can pose problems for genetic studies. Recently, to overcome the challenge that genetic information of non-participants is unavailable, it is shown that by comparing the IBD (identity by descent) shared and not-shared segments among the participants, one can estimate the genetic component underlying participation. That, however, does not direct… ▽ More

    Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  2. arXiv:2310.15454  [pdf, other

    cs.LG cs.CR stat.ML

    Private Learning with Public Features

    Authors: Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Shuang Song, Abhradeep Thakurta, Li Zhang

    Abstract: We study a class of private learning problems in which the data is a join of private and public features. This is often the case in private personalization tasks such as recommendation or ad prediction, in which features related to individuals are sensitive, while features related to items (the movies or songs to be recommended, or the ads to be shown to users) are publicly available and do not re… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  3. arXiv:2306.15163  [pdf, other

    stat.ML cs.LG

    Wasserstein Generative Regression

    Authors: Shanshan Song, Tong Wang, Guohao Shen, Yuanyuan Lin, Jian Huang

    Abstract: In this paper, we propose a new and unified approach for nonparametric regression and conditional distribution learning. Our approach simultaneously estimates a regression function and a conditional generator using a generative learning framework, where a conditional generator is a function that can generate samples from a conditional distribution. The main idea is to estimate a conditional genera… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: 50 pages, including appendix. 5 figures and 6 tables in the main text. 1 figure and 7 tables in the appendix

    MSC Class: 62G08; 68T07

  4. arXiv:2302.07975  [pdf, other

    cs.LG cs.CR stat.ML

    Multi-Task Differential Privacy Under Distribution Skew

    Authors: Walid Krichene, Prateek Jain, Shuang Song, Mukund Sundararajan, Abhradeep Thakurta, Li Zhang

    Abstract: We study the problem of multi-task learning under user-level differential privacy, in which $n$ users contribute data to $m$ tasks, each involving a subset of users. One important aspect of the problem, that can significantly impact quality, is the distribution skew among tasks. Certain tasks may have much fewer data samples than others, making them more susceptible to the noise added for privacy.… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  5. arXiv:2109.15261  [pdf, other

    stat.ME math.PR math.ST q-bio.QM

    A simple and flexible test of sample exchangeability with applications to statistical genomics

    Authors: Alan J. Aw, Jeffrey P. Spence, Yun S. Song

    Abstract: In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics,… ▽ More

    Submitted 30 August, 2023; v1 submitted 30 September, 2021; originally announced September 2021.

    Comments: 24 pages. Supplementary Information file (38 pages, contains mathematical proofs) is available at https://github.com/songlab-cal/flinty/

    MSC Class: 62G10; 62H15; 62P10 ACM Class: G.3

  6. arXiv:2107.09802  [pdf, other

    cs.LG cs.CR stat.ML

    Private Alternating Least Squares: Practical Private Matrix Completion with Tighter Rates

    Authors: Steve Chien, Prateek Jain, Walid Krichene, Steffen Rendle, Shuang Song, Abhradeep Thakurta, Li Zhang

    Abstract: We study the problem of differentially private (DP) matrix completion under user-level privacy. We design a joint differentially private variant of the popular Alternating-Least-Squares (ALS) method that achieves: i) (nearly) optimal sample complexity for matrix completion (in terms of number of items, users), and ii) the best known privacy/utility trade-off both theoretically, as well as on bench… ▽ More

    Submitted 20 July, 2021; originally announced July 2021.

  7. arXiv:2105.10590  [pdf, other

    stat.ML cs.LG q-bio.BM q-bio.QM

    Parallelizing Contextual Bandits

    Authors: Jeffrey Chan, Aldo Pacchiano, Nilesh Tripuraneni, Yun S. Song, Peter Bartlett, Michael I. Jordan

    Abstract: Standard approaches to decision-making under uncertainty focus on sequential exploration of the space of decisions. However, \textit{simultaneously} proposing a batch of decisions, which leverages available resources for parallel experimentation, has the potential to rapidly accelerate exploration. We present a family of (parallel) contextual bandit algorithms applicable to problems with bounded e… ▽ More

    Submitted 5 February, 2023; v1 submitted 21 May, 2021; originally announced May 2021.

  8. arXiv:2103.05220  [pdf

    eess.IV cs.CV cs.LG stat.AP

    Prediction of 5-year Progression-Free Survival in Advanced Nasopharyngeal Carcinoma with Pretreatment PET/CT using Multi-Modality Deep Learning-based Radiomics

    Authors: Bingxin Gu, Mingyuan Meng, Lei Bi, **man Kim, David Dagan Feng, Shaoli Song

    Abstract: Objective: Deep Learning-based Radiomics (DLR) has achieved great success in medical image analysis and has been considered a replacement for conventional radiomics that relies on handcrafted features. In this study, we aimed to explore the capability of DLR for the prediction of 5-year Progression-Free Survival (PFS) in Nasopharyngeal Carcinoma (NPC) using pretreatment PET/CT. Methods: A total of… ▽ More

    Submitted 4 July, 2022; v1 submitted 8 March, 2021; originally announced March 2021.

    Comments: Accepted at Frontiers in Oncology

    Journal ref: Frontiers in Oncology, vol. 12, pp. 899352, 2022

  9. arXiv:2101.10832  [pdf, other

    cs.CV cs.LG stat.ML

    Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

    Authors: Yulin Wang, Zanlin Ni, Shiji Song, Le Yang, Gao Huang

    Abstract: Due to the need to store the intermediate activations for back-propagation, end-to-end (E2E) training of deep networks usually suffers from high GPUs memory footprint. This paper aims to address this problem by revisiting the locally supervised learning, where a network is split into gradient-isolated modules and trained with local supervision. We experimentally show that simply training local mod… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

    Comments: Accepted by ICLR 2021

  10. arXiv:2009.12999  [pdf, other

    cs.LG cs.DC stat.ML

    Loosely Coupled Federated Learning Over Generative Models

    Authors: Shaoming Song, Yunfeng Shao, Jian Li

    Abstract: Federated learning (FL) was proposed to achieve collaborative machine learning among various clients without uploading private data. However, due to model aggregation strategies, existing frameworks require strict model homogeneity, limiting the application in more complicated scenarios. Besides, the communication cost of FL's model and gradient transmission is extremely high. This paper proposes… ▽ More

    Submitted 27 September, 2020; originally announced September 2020.

  11. arXiv:2007.14191  [pdf, other

    stat.ML cs.CR cs.LG

    Tempered Sigmoid Activations for Deep Learning with Differential Privacy

    Authors: Nicolas Papernot, Abhradeep Thakurta, Shuang Song, Steve Chien, Úlfar Erlingsson

    Abstract: Because learning sometimes involves sensitive data, machine learning algorithms have been extended to offer privacy for training data. In practice, this has been mostly an afterthought, with privacy-preserving models obtained by re-running training with a different optimizer, but using the model architectures that already performed well in a non-privacy-preserving setting. This approach leads to l… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

  12. arXiv:2007.14080  [pdf, ps, other

    stat.ME stat.CO

    A set of efficient methods to generate high-dimensional binary data with specified correlation structures

    Authors: Wei Jiang, Shuang Song, Lin Hou, Hongyu Zhao

    Abstract: High dimensional correlated binary data arise in many areas, such as observed genetic variations in biomedical research. Data simulation can help researchers evaluate efficiency and explore properties of different computational and statistical methods. Also, some statistical methods, such as Monte-Carlo methods, rely on data simulation. Lunn and Davies (1998) proposed linear time complexity method… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

  13. arXiv:2007.02394  [pdf, other

    cs.LG cs.CV stat.ML

    Meta-Semi: A Meta-learning Approach for Semi-supervised Learning

    Authors: Yulin Wang, Jiayi Guo, Shiji Song, Gao Huang

    Abstract: Deep learning based semi-supervised learning (SSL) algorithms have led to promising results in recent years. However, they tend to introduce multiple tunable hyper-parameters, making them less practical in real SSL scenarios where the labeled data is scarce for extensive hyper-parameter search. In this paper, we propose a novel meta-learning based SSL algorithm (Meta-Semi) that requires tuning onl… ▽ More

    Submitted 7 September, 2021; v1 submitted 5 July, 2020; originally announced July 2020.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  14. arXiv:2006.06783  [pdf, other

    cs.CR cs.LG math.OC stat.ML

    Evading Curse of Dimensionality in Unconstrained Private GLMs via Private Gradient Descent

    Authors: Shuang Song, Thomas Steinke, Om Thakkar, Abhradeep Thakurta

    Abstract: We revisit the well-studied problem of differentially private empirical risk minimization (ERM). We show that for unconstrained convex generalized linear models (GLMs), one can obtain an excess empirical risk of $\tilde O\left(\sqrt{\texttt{rank}}/εn\right)$, where ${\texttt{rank}}$ is the rank of the feature matrix in the GLM problem, $n$ is the number of data samples, and $ε$ is the privacy para… ▽ More

    Submitted 2 March, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

  15. arXiv:2004.14774  [pdf, other

    cs.CV cs.LG cs.RO eess.IV stat.ML

    IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report

    Authors: Qi She, Fan Feng, Qi Liu, Rosa H. M. Chan, Xinyue Hao, Chuanlin Lan, Qihan Yang, Vincenzo Lomonaco, German I. Parisi, Heechul Bae, Eoin Brophy, Baoquan Chen, Gabriele Graffieti, Vidit Goel, Hyonyoung Han, Sathursan Kanagarajah, Somesh Kumar, Siew-Kei Lam, Tin Lun Lam, Liang Ma, Davide Maltoni, Lorenzo Pellegrini, Duvindu Piyasena, Shiliang Pu, Debdoot Sheet , et al. (11 additional authors not shown)

    Abstract: This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, w… ▽ More

    Submitted 26 April, 2020; originally announced April 2020.

    Comments: 9 pages, 11 figures, 3 tables, accepted into IEEE Robotics and Automation Magazine. arXiv admin note: text overlap with arXiv:1911.06487

  16. arXiv:2003.03351  [pdf, ps, other

    cs.LG stat.ML

    Tighter Bound Estimation of Sensitivity Analysis for Incremental and Decremental Data Modification

    Authors: Kaichen Zhou, Shiji Song, Gao Huang, Wu Cheng, Quan Zhou

    Abstract: In large-scale classification problems, the data set always be faced with frequent updates when a part of the data is added to or removed from the original data set. In this case, conventional incremental learning, which updates an existing classifier by explicitly modeling the data modification, is more efficient than retraining a new classifier from scratch. However, sometimes, we are more inter… ▽ More

    Submitted 13 January, 2021; v1 submitted 6 March, 2020; originally announced March 2020.

  17. arXiv:1912.00594  [pdf, other

    cs.LG stat.ML

    Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels

    Authors: Shuang Song, David Berthelot, Afshin Rostamizadeh

    Abstract: We propose using active learning based techniques to further improve the state-of-the-art semi-supervised learning MixMatch algorithm. We provide a thorough empirical evaluation of several active-learning and baseline methods, which successfully demonstrate a significant improvement on the benchmark CIFAR-10, CIFAR-100, and SVHN datasets (as much as 1.5% in absolute accuracy). We also provide an e… ▽ More

    Submitted 2 December, 2019; v1 submitted 2 December, 2019; originally announced December 2019.

  18. arXiv:1911.05493  [pdf, other

    cs.SI cs.LG stat.ML

    UrbanRhythm: Revealing Urban Dynamics Hidden in Mobility Data

    Authors: Sirui Song, Tong Xia, Depeng **, Pan Hui, Yong Li

    Abstract: Understanding urban dynamics, i.e., how the types and intensity of urban residents' activities in the city change along with time, is of urgent demand for building an efficient and livable city. Nonetheless, this is challenging due to the expanding urban population and the complicated spatial distribution of residents. In this paper, to reveal urban dynamics, we propose a novel system UrbanRhythm… ▽ More

    Submitted 2 November, 2019; originally announced November 2019.

    Comments: Submitted to IEEE Transactions on Knowledge and Data Engineering (TKDE)

  19. arXiv:1909.12220  [pdf, other

    cs.CV cs.LG stat.ML

    Implicit Semantic Data Augmentation for Deep Networks

    Authors: Yulin Wang, Xuran Pan, Shiji Song, Hong Zhang, Cheng Wu, Gao Huang

    Abstract: In this paper, we propose a novel implicit semantic data augmentation (ISDA) approach to complement traditional augmentation techniques like flip**, translation or rotation. Our work is motivated by the intriguing property that deep networks are surprisingly good at linearizing features, such that certain directions in the deep feature space correspond to meaningful semantic transformations, e.g… ▽ More

    Submitted 24 April, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: Accepted by NeurIPS 2019

  20. arXiv:1909.03245  [pdf, other

    cs.LG cs.AI stat.ML

    Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

    Authors: Wenjie Shi, Shiji Song, Hui Wu, Ya-Chu Hsu, Cheng Wu, Gao Huang

    Abstract: Model-free deep reinforcement learning (RL) algorithms have been widely used for a range of complex control tasks. However, slow convergence and sample inefficiency remain challenging problems in RL, especially when handling continuous and high-dimensional state spaces. To tackle this problem, we propose a general acceleration method for model-free, off-policy deep RL algorithms by drawing the ide… ▽ More

    Submitted 6 December, 2021; v1 submitted 7 September, 2019; originally announced September 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

  21. arXiv:1909.03204  [pdf, other

    cs.LG cs.AI stat.ML

    Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

    Authors: Wenjie Shi, Shiji Song, Cheng Wu, C. L. Philip Chen

    Abstract: This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor-critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of… ▽ More

    Submitted 7 September, 2019; originally announced September 2019.

    Comments: IEEE Transactions on Neural Networks and Learning Systems

  22. arXiv:1909.03198  [pdf, other

    cs.LG cs.AI stat.ML

    Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning

    Authors: Wenjie Shi, Shiji Song, Cheng Wu

    Abstract: Maximum entropy deep reinforcement learning (RL) methods have been demonstrated on a range of challenging continuous tasks. However, existing methods either suffer from severe instability when training on large off-policy data or cannot scale to tasks with very high state and action dimensionality such as 3D humanoid locomotion. Besides, the optimality of desired Boltzmann policy set for non-optim… ▽ More

    Submitted 7 September, 2019; originally announced September 2019.

    Comments: to be published in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19)

  23. arXiv:1907.11510  [pdf, ps, other

    cs.HC cs.CV cs.IR cs.LG stat.ML

    AVEC 2019 Workshop and Challenge: State-of-Mind, Detecting Depression with AI, and Cross-Cultural Affect Recognition

    Authors: Fabien Ringeval, Björn Schuller, Michel Valstar, NIcholas Cummins, Roddy Cowie, Leili Tavabi, Maximilian Schmitt, Sina Alisamir, Shahin Amiriparian, Eva-Maria Messner, Siyang Song, Shuo Liu, Zi** Zhao, Adria Mallol-Ragolta, Zhao Ren, Mohammad Soleymani, Maja Pantic

    Abstract: The Audio/Visual Emotion Challenge and Workshop (AVEC 2019) "State-of-Mind, Detecting Depression with AI, and Cross-cultural Affect Recognition" is the ninth competition event aimed at the comparison of multimedia processing and machine learning methods for automatic audiovisual health and emotion analysis, with all participants competing strictly under the same conditions. The goal of the Challen… ▽ More

    Submitted 10 July, 2019; originally announced July 2019.

  24. arXiv:1906.08230  [pdf, other

    cs.LG q-bio.BM stat.ML

    Evaluating Protein Transfer Learning with TAPE

    Authors: Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John Canny, Pieter Abbeel, Yun S. Song

    Abstract: Protein modeling is an increasingly popular area of machine learning research. Semi-supervised learning has emerged as an important paradigm in protein modeling due to the high cost of acquiring supervised protein labels, but the current literature is fragmented when it comes to datasets and standardized evaluation techniques. To facilitate progress in this field, we introduce the Tasks Assessing… ▽ More

    Submitted 19 June, 2019; originally announced June 2019.

    Comments: 20 pages, 4 figures

  25. arXiv:1903.11239  [pdf, other

    cs.RO cs.AI cs.CV cs.LG stat.ML

    TossingBot: Learning to Throw Arbitrary Objects with Residual Physics

    Authors: Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser

    Abstract: We investigate whether a robot arm can learn to pick and throw arbitrary objects into selected boxes quickly and accurately. Throwing has the potential to increase the physical reachability and picking speed of a robot arm. However, precisely throwing arbitrary objects in unstructured settings presents many challenges: from acquiring reliable pre-throw conditions (e.g. initial pose of object in ma… ▽ More

    Submitted 30 May, 2020; v1 submitted 27 March, 2019; originally announced March 2019.

    Comments: Summary Video: https://youtu.be/f5Zn2Up2RjQ Project webpage: https://tossingbot.cs.princeton.edu

  26. arXiv:1808.01095  [pdf, other

    cs.LG cs.DB stat.ML

    Helix: Accelerating Human-in-the-loop Machine Learning

    Authors: Doris Xin, Litian Ma, Jialin Liu, Stephen Macke, Shuchen Song, Aditya Parameswaran

    Abstract: Data application developers and data scientists spend an inordinate amount of time iterating on machine learning (ML) workflows -- by modifying the data pre-processing, model training, and post-processing steps -- via trial-and-error to achieve the desired model performance. Existing work on accelerating machine learning focuses on speeding up one-shot execution of workflows, failing to address th… ▽ More

    Submitted 3 August, 2018; originally announced August 2018.

  27. arXiv:1807.11205  [pdf, other

    cs.LG cs.DC stat.ML

    Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

    Authors: Xianyan Jia, Shutao Song, Wei He, Yangzihao Wang, Haidong Rong, Feihu Zhou, Liqiang Xie, Zhenyu Guo, Yuanzhou Yang, Liwei Yu, Tiegang Chen, Guangxiao Hu, Shaohuai Shi, Xiaowen Chu

    Abstract: Synchronized stochastic gradient descent (SGD) optimizers with data parallelism are widely used in training large-scale deep neural networks. Although using larger mini-batch sizes can improve the system scalability by reducing the communication-to-computation ratio, it may hurt the generalization ability of the models. To this end, we build a highly scalable deep learning training system for dens… ▽ More

    Submitted 30 July, 2018; originally announced July 2018.

    Comments: arXiv admin note: text overlap with arXiv:1803.03383 by other authors

  28. arXiv:1803.10311  [pdf, other

    cs.LG cs.DB cs.HC stat.ML

    How Developers Iterate on Machine Learning Workflows -- A Survey of the Applied Machine Learning Literature

    Authors: Doris Xin, Litian Ma, Shuchen Song, Aditya Parameswaran

    Abstract: Machine learning workflow development is anecdotally regarded to be an iterative process of trial-and-error with humans-in-the-loop. However, we are not aware of quantitative evidence corroborating this popular belief. A quantitative characterization of iteration can serve as a benchmark for machine learning workflow development in practice, and can aid the development of human-in-the-loop machine… ▽ More

    Submitted 17 May, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

  29. arXiv:1803.09956  [pdf, other

    cs.RO cs.AI cs.CV cs.LG stat.ML

    Learning Synergies between Pushing and Gras** with Self-supervised Deep Reinforcement Learning

    Authors: Andy Zeng, Shuran Song, Stefan Welker, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser

    Abstract: Skilled robotic manipulation benefits from complex synergies between non-prehensile (e.g. pushing) and prehensile (e.g. gras**) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, gras** can help displace objects to make pushing movements more precise and collision-free. In this work, we demonstrate that it is possible to discover and learn these… ▽ More

    Submitted 30 September, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

    Comments: To appear at the International Conference On Intelligent Robots and Systems (IROS) 2018. Project webpage: http://vpg.cs.princeton.edu Summary video: https://youtu.be/-OkyX7ZlhiU

  30. arXiv:1802.08908  [pdf, other

    stat.ML cs.CR cs.LG

    Scalable Private Learning with PATE

    Authors: Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, Úlfar Erlingsson

    Abstract: The rapid adoption of machine learning has increased concerns about the privacy implications of machine learning models trained on sensitive data, such as medical records or other personal information. To address those concerns, one promising approach is Private Aggregation of Teacher Ensembles, or PATE, which transfers to a "student" model the knowledge of an ensemble of "teacher" models, with in… ▽ More

    Submitted 24 February, 2018; originally announced February 2018.

    Comments: Published as a conference paper at ICLR 2018

  31. arXiv:1802.06153  [pdf, other

    cs.LG q-bio.PE stat.ML

    A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks

    Authors: Jeffrey Chan, Valerio Perrone, Jeffrey P. Spence, Paul A. Jenkins, Sara Mathieson, Yun S. Song

    Abstract: An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data. Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models. To achieve this, two inferential chal… ▽ More

    Submitted 5 November, 2018; v1 submitted 16 February, 2018; originally announced February 2018.

    Comments: 9 pages, 8 figures

  32. arXiv:1711.04979  [pdf, other

    quant-ph cond-mat.other cs.DS q-bio.QM stat.ML

    Quantum transport senses community structure in networks

    Authors: Chenchao Zhao, Jun S. Song

    Abstract: Quantum time evolution exhibits rich physics, attributable to the interplay between the density and phase of a wave function. However, unlike classical heat diffusion, the wave nature of quantum mechanics has not yet been extensively explored in modern data analysis. We propose that the Laplace transform of quantum transport (QT) can be used to construct an ensemble of maps from a given complex ne… ▽ More

    Submitted 12 January, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

    Journal ref: Phys. Rev. E 98, 022301 (2018)

  33. arXiv:1707.02702  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Composition Properties of Inferential Privacy for Time-Series Data

    Authors: Shuang Song, Kamalika Chaudhuri

    Abstract: With the proliferation of mobile devices and the internet of things, develo** principled solutions for privacy in time series applications has become increasingly important. While differential privacy is the gold standard for database privacy, many time series applications require a different kind of guarantee, and a number of recent works have used some form of inferential privacy to address th… ▽ More

    Submitted 10 July, 2017; originally announced July 2017.

  34. arXiv:1703.10827  [pdf, other

    stat.ML

    Intraoperative margin assessment of human breast tissue in optical coherence tomography images using deep neural networks

    Authors: Amal Rannen Triki, Matthew B. Blaschko, Yoon Mo Jung, Seungri Song, Hyun Ju Han, Seung Il Kim, Chulmin Joo

    Abstract: Objective: In this work, we perform margin assessment of human breast tissue from optical coherence tomography (OCT) images using deep neural networks (DNNs). This work simulates an intraoperative setting for breast cancer lumpectomy. Methods: To train the DNNs, we use both the state-of-the-art methods (Weight Decay and DropOut) and a newly introduced regularization method based on function norms.… ▽ More

    Submitted 31 March, 2017; originally announced March 2017.

    Comments: 16 pages, 9 figures

  35. arXiv:1702.01373  [pdf, other

    stat.ML q-bio.QM stat.CO

    Exact heat kernel on a hypersphere and its applications in kernel SVM

    Authors: Chenchao Zhao, Jun S. Song

    Abstract: Many contemporary statistical learning methods assume a Euclidean feature space. This paper presents a method for defining similarity based on hyperspherical geometry and shows that it often improves the performance of support vector machine compared to other competing similarity measures. Specifically, the idea of using heat diffusion on a hypersphere to measure similarity has been previously pro… ▽ More

    Submitted 19 November, 2017; v1 submitted 4 February, 2017; originally announced February 2017.

  36. arXiv:1701.08145  [pdf

    stat.AP

    Synthesizing Correlations with Computational Likelihood Approach: Vitamin C Data

    Authors: Myung Soon Song

    Abstract: It is known that the primary source of dietary vitamin C is fruit and vegetables and the plasma level of vitamin C has been considered a good surrogate biomarker of vitamin C intake by fruit and vegetable consumption. To combine the information about association between vitamin C intake and the plasma level of vitamin C, numerical approximation methods for likelihood function of correlation coeffi… ▽ More

    Submitted 21 May, 2017; v1 submitted 21 January, 2017; originally announced January 2017.

    Comments: 15 pages, 2 figures, 5 tables

  37. arXiv:1612.03839  [pdf, other

    cs.LG stat.ML

    Tensor Decompositions via Two-Mode Higher-Order SVD (HOSVD)

    Authors: Miaoyan Wang, Yun S. Song

    Abstract: Tensor decompositions have rich applications in statistics and machine learning, and develo** efficient, accurate algorithms for the problem has received much attention recently. Here, we present a new method built on Kruskal's uniqueness theorem to decompose symmetric, nearly orthogonally decomposable tensors. Unlike the classical higher-order singular value decomposition which unfolds a tensor… ▽ More

    Submitted 18 April, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

    Comments: 33 pages, 5 figures

    Journal ref: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, Vol. 54 (2017) 614-622

  38. arXiv:1608.02221  [pdf

    stat.AP math.NA

    Quantile based global sensitivity measures

    Authors: Sergei Kucherenko, Shufang Song

    Abstract: New global sensitivity measures based on quantiles of the output are introduced. Such measures can be used for global sensitivity analysis of problems in which quantiles are explicitly the functions of interest and for identification of variables which are the most important in achieving extreme values of the model output. It is proven that there is a link between introduced measures and Sobol mai… ▽ More

    Submitted 7 August, 2016; originally announced August 2016.

    Comments: 25 pages, 9 figures

    ACM Class: B.2.3; G.1.0; G.3

  39. arXiv:1606.00770  [pdf

    math.NA stat.OT

    Different numerical estimators for main effect global sensitivity indices

    Authors: Sergei Kucherenko, Shufang Song

    Abstract: The variance-based method of global sensitivity indices based on Sobol sensitivity indices became very popular among practitioners due to its easiness of interpretation. For complex practical problems computation of Sobol indices generally requires a large number of function evaluations to achieve reasonable convergence. Four different direct formulas for computing Sobol main effect sensitivity in… ▽ More

    Submitted 2 June, 2016; originally announced June 2016.

  40. arXiv:1603.03977  [pdf, other

    cs.LG cs.CR stat.ML

    Pufferfish Privacy Mechanisms for Correlated Data

    Authors: Shuang Song, Yizhen Wang, Kamalika Chaudhuri

    Abstract: Many modern databases include personal and sensitive correlated data, such as private information on users connected together in a social network, and measurements of physical activity of single subjects across time. However, differential privacy, the current gold standard in data privacy, does not adequately address privacy issues in this kind of data. This work looks at a recent generalization… ▽ More

    Submitted 12 March, 2017; v1 submitted 12 March, 2016; originally announced March 2016.

  41. arXiv:1409.1976  [pdf, ps, other

    stat.ML cs.LG

    A Reduction of the Elastic Net to Support Vector Machines with an Application to GPU Computing

    Authors: Quan Zhou, Wenlin Chen, Shiji Song, Jacob R. Gardner, Kilian Q. Weinberger, Yixin Chen

    Abstract: The past years have witnessed many dedicated open-source projects that built and maintain implementations of Support Vector Machines (SVM), parallelized for GPU, multi-core CPUs and distributed systems. Up to this point, no comparable effort has been made to parallelize the Elastic Net, despite its popularity in many high impact applications, including genetics, neuroscience and systems biology. T… ▽ More

    Submitted 5 September, 2014; originally announced September 2014.

    Comments: 10 pages

  42. arXiv:1310.1068  [pdf, ps, other

    q-bio.PE math.FA stat.AP stat.ME

    A novel spectral method for inferring general diploid selection from time series genetic data

    Authors: Matthias Steinrücken, Anand Bhaskar, Yun S. Song

    Abstract: The increased availability of time series genetic variation data from experimental evolution studies and ancient DNA samples has created new opportunities to identify genomic regions under selective pressure and to estimate their associated fitness parameters. However, it is a challenging problem to compute the likelihood of nonneutral models for the population allele frequency dynamics, given the… ▽ More

    Submitted 26 January, 2015; v1 submitted 3 October, 2013; originally announced October 2013.

    Comments: Published in at http://dx.doi.org/10.1214/14-AOAS764 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS764

    Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 4, 2203-2222

  43. arXiv:1309.5056  [pdf, ps, other

    q-bio.PE math.ST stat.AP

    Descartes' rule of signs and the identifiability of population demographic models from genomic variation data

    Authors: Anand Bhaskar, Yun S. Song

    Abstract: The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has be… ▽ More

    Submitted 1 December, 2014; v1 submitted 19 September, 2013; originally announced September 2013.

    Comments: Published in at http://dx.doi.org/10.1214/14-AOS1264 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1264

    Journal ref: Annals of Statistics 2014, Vol. 42, No. 6, 2469-2493

  44. The influence of relatives on the efficiency and error rate of familial searching

    Authors: Rori V. Rohlfs, Erin Murphy, Yun S. Song, Montgomery Slatkin

    Abstract: We investigate the consequences of adopting the criteria used by the state of California, as described by Myers et al. (2011), for conducting familial searches. We carried out a simulation study of randomly generated profiles of related and unrelated individuals with 13-locus CODIS genotypes and YFiler Y-chromosome haplotypes, on which the Myers protocol for relative identification was carried out… ▽ More

    Submitted 14 August, 2013; v1 submitted 10 April, 2013; originally announced April 2013.

    Comments: main text: 19 pages, 4 tables, 2 figures supplemental text: 2 pages, 5 tables all together as single file

    Journal ref: PLoS ONE 8(8): e70495 (2013)

  45. arXiv:1106.3921  [pdf, other

    stat.ML q-fin.RM q-fin.ST stat.ME

    Dynamic Large Spatial Covariance Matrix Estimation in Application to Semiparametric Model Construction via Variable Clustering: the SCE approach

    Authors: Song Song

    Abstract: To better understand the spatial structure of large panels of economic and financial time series and provide a guideline for constructing semiparametric models, this paper first considers estimating a large spatial covariance matrix of the generalized $m$-dependent and $β$-mixing time series (with $J$ variables and $T$ observations) by hard thresholding regularization as long as… ▽ More

    Submitted 23 June, 2011; v1 submitted 20 June, 2011; originally announced June 2011.

  46. arXiv:1106.3915  [pdf, other

    stat.ML q-fin.ST stat.ME

    Large Vector Auto Regressions

    Authors: Song Song, Peter J. Bickel

    Abstract: One popular approach for nonstructural economic and financial forecasting is to include a large number of economic and financial variables, which has been shown to lead to significant improvements for forecasting, for example, by the dynamic factor models. A challenging issue is to determine which variables and (their) lags are relevant, especially when there is a mixture of serial correlation (te… ▽ More

    Submitted 20 June, 2011; originally announced June 2011.