Skip to main content

Showing 1–13 of 13 results for author: Safro, I

Searching in archive stat. Search in all archives.
.
  1. arXiv:2003.08420  [pdf, other

    cs.LG cs.IT stat.ML

    Unsupervised Hierarchical Graph Representation Learning by Mutual Information Maximization

    Authors: Fei Ding, Xiaohong Zhang, Justin Sybrandt, Ilya Safro

    Abstract: Graph representation learning based on graph neural networks (GNNs) can greatly improve the performance of downstream tasks, such as node and graph classification. However, the general GNN models do not aggregate node information in a hierarchical manner, and can miss key higher-order structural features of many graphs. The hierarchical aggregation also enables the graph representations to be expl… ▽ More

    Submitted 28 July, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: 7 pages, 2 figures, 4 tables

    Journal ref: the 16th International Workshop on Mining and Learning with Graphs (MLG 2020)

  2. CBAG: Conditional Biomedical Abstract Generation

    Authors: Justin Sybrandt, Ilya Safro

    Abstract: Biomedical research papers use significantly different language and jargon when compared to typical English text, which reduces the utility of pre-trained NLP models in this domain. Meanwhile Medline, a database of biomedical abstracts, introduces nearly a million new documents per-year. Applications that could benefit from understanding this wealth of publicly available information, such as scien… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

  3. arXiv:2002.05635  [pdf, other

    cs.LG stat.ML

    AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach

    Authors: Justin Sybrandt, Ilya Tyagin, Michael Shtutman, Ilya Safro

    Abstract: Medical research is risky and expensive. Drug discovery, as an example, requires that researchers efficiently winnow thousands of potential targets to a small candidate set for more thorough evaluation. However, research groups spend significant time and money to perform the experiments necessary to determine this candidate set long before seeing intermediate results. Hypothesis generation systems… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

  4. arXiv:1905.10953  [pdf, other

    cs.LG cs.SI stat.ML

    FOBE and HOBE: First- and High-Order Bipartite Embeddings

    Authors: Justin Sybrandt, Ilya Safro

    Abstract: Typical graph embeddings may not capture type-specific bipartite graph features that arise in such areas as recommender systems, data visualization, and drug discovery. Machine learning methods utilized in these applications would be better served with specialized embedding techniques. We propose two embeddings for bipartite graphs that decompose edges into sets of indirect relationships between n… ▽ More

    Submitted 22 July, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

  5. arXiv:1808.06241  [pdf, other

    stat.AP cs.SI physics.soc-ph

    Spatio-temporal prediction of crimes using network analytic approach

    Authors: Saroj Kumar Dash, Ilya Safro, Ravisutha Sakrepatna Srinivasamurthy

    Abstract: It is quite evident that majority of the population lives in urban area today than in any time of the human history. This trend seems to increase in coming years. A study [5] says that nearly 80.7% of total population in USA stays in urban area. By 2030 nearly 60% of the population in the world will live in or move to cities. With the increase in urban population, it is important to keep an eye on… ▽ More

    Submitted 30 October, 2018; v1 submitted 19 August, 2018; originally announced August 2018.

  6. arXiv:1708.07534  [pdf

    stat.AP cs.SI physics.soc-ph

    Detecting and monitoring foodborne illness outbreaks: Twitter communications and the 2015 U.S. Salmonella outbreak linked to imported cucumbers

    Authors: Yuliya V. Bolotova, Jie Lou, Ilya Safro

    Abstract: This research uses Twitter, as a social media device, to track communications related to the 2015 U.S. foodborne illness outbreak linked to Salmonella in imported cucumbers from Mexico. The relevant Twitter data are analyzed in light of the timeline of the official announcements made by the Centers for Disease Control and Prevention (CDC). The largest number of registered tweets is associated with… ▽ More

    Submitted 24 August, 2017; originally announced August 2017.

  7. arXiv:1707.07657  [pdf, other

    cs.LG cs.DS stat.CO stat.ML

    Engineering fast multilevel support vector machines

    Authors: E. Sadrfaridpour, T. Razzaghi, I. Safro

    Abstract: The computational complexity of solving nonlinear support vector machine (SVM) is prohibitive on large-scale data. In particular, this issue becomes very sensitive when the data represents additional difficulties such as highly imbalanced class sizes. Typically, nonlinear kernels produce significantly higher classification quality to linear kernels but introduce extra kernel and model parameters w… ▽ More

    Submitted 5 April, 2019; v1 submitted 24 July, 2017; originally announced July 2017.

    Comments: 41 pages, 7 figures

  8. arXiv:1702.06176  [pdf, other

    cs.IR cs.DL cs.SI q-bio.QM stat.OT

    MOLIERE: Automatic Biomedical Hypothesis Generation System

    Authors: Justin Sybrandt, Michael Shtutman, Ilya Safro

    Abstract: Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. Typically, these systems operate on domain-specific fractions of public medical data. MOLIERE, in contrast, utilizes information from over 24.5 million documents. At the heart of our approach lies a multi-modal and multi-relation… ▽ More

    Submitted 31 May, 2017; v1 submitted 20 February, 2017; originally announced February 2017.

    ACM Class: H.2.8; J.3; H.3; I.5.4

  9. arXiv:1611.05487  [pdf, ps, other

    stat.ML cs.DS cs.LG stat.CO

    Algebraic multigrid support vector machines

    Authors: Ehsan Sadrfaridpour, Sandeep Jeereddy, Ken Kennedy, Andre Luckow, Talayeh Razzaghi, Ilya Safro

    Abstract: The support vector machine is a flexible optimization-based technique widely used for classification problems. In practice, its training part becomes computationally expensive on large-scale data sets because of such reasons as the complexity and number of iterations in parameter fitting methods, underlying optimization solvers, and nonlinearity of kernels. We introduce a fast multilevel framework… ▽ More

    Submitted 23 November, 2016; v1 submitted 16 November, 2016; originally announced November 2016.

  10. arXiv:1610.07703  [pdf, other

    cs.IR stat.ML

    Scalable Dynamic Topic Modeling with Clustered Latent Dirichlet Allocation (CLDA)

    Authors: Chris Gropp, Alexander Herzog, Ilya Safro, Paul W. Wilson, Amy W. Apon

    Abstract: Topic modeling, a method for extracting the underlying themes from a collection of documents, is an increasingly important component of the design of intelligent systems enabling the sense-making of highly dynamic and diverse streams of text data. Traditional methods such as Dynamic Topic Modeling (DTM) do not lend themselves well to direct parallelization because of dependencies from one time ste… ▽ More

    Submitted 4 October, 2019; v1 submitted 24 October, 2016; originally announced October 2016.

  11. arXiv:1604.02123  [pdf, other

    stat.ML cs.LG stat.AP

    Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

    Authors: Talayeh Razzaghi, Oleg Roderick, Ilya Safro, Nicholas Marko

    Abstract: This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techn… ▽ More

    Submitted 7 April, 2016; originally announced April 2016.

    Comments: arXiv admin note: substantial text overlap with arXiv:1503.06250

  12. arXiv:1503.06250  [pdf, other

    stat.ML cs.LG

    Fast Imbalanced Classification of Healthcare Data with Missing Values

    Authors: Talayeh Razzaghi, Oleg Roderick, Ilya Safro, Nick Marko

    Abstract: In medical domain, data features often contain missing values. This can create serious bias in the predictive modeling. Typical standard data mining methods often produce poor performance measures. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. The proposed method is based on a multilevel framework of the cost-sensitive SV… ▽ More

    Submitted 20 March, 2015; originally announced March 2015.

  13. arXiv:1410.3348  [pdf, other

    stat.ML cs.LG

    Fast Multilevel Support Vector Machines

    Authors: Talayeh Razzaghi, Ilya Safro

    Abstract: Solving different types of optimization models (including parameters fitting) for support vector machines on large-scale training data is often an expensive computational task. This paper proposes a multilevel algorithmic framework that scales efficiently to very large data sets. Instead of solving the whole training set in one optimization process, the support vectors are obtained and gradually r… ▽ More

    Submitted 13 October, 2014; originally announced October 2014.