Skip to main content

Showing 1–50 of 62 results for author: Safro, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.14399  [pdf, other

    quant-ph cs.CE physics.comp-ph

    MLQAOA: Graph Learning Accelerated Hybrid Quantum-Classical Multilevel QAOA

    Authors: Bao Bach, Jose Falla, Ilya Safro

    Abstract: Learning the problem structure at multiple levels of coarseness to inform the decomposition-based hybrid quantum-classical combinatorial optimization solvers is a promising approach to scaling up variational approaches. We introduce a multilevel algorithm reinforced with the spectral graph representation learning-based accelerator to tackle large-scale graph maximum cut instances and fused with se… ▽ More

    Submitted 30 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 18 pages, 3 figures, 4 tables

  2. arXiv:2312.03303  [pdf, other

    cs.AI cs.CL cs.LG

    Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking Technique

    Authors: Ilya Tyagin, Ilya Safro

    Abstract: This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. Utilizing curated datasets, our approach tests these systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  3. arXiv:2310.11812  [pdf, other

    cs.DS

    Open Problems in (Hyper)Graph Decomposition

    Authors: Deepak Ajwani, Rob H. Bisseling, Katrin Casel, Ümit V. Çatalyürek, Cédric Chevalier, Florian Chudigiewitsch, Marcelo Fonseca Faraj, Michael Fellows, Lars Gottesbüren, Tobias Heuer, George Karypis, Kamer Kaya, Jakub Lacki, Johannes Langguth, Xiaoye Sherry Li, Ruben Mayer, Johannes Meintrup, Yosuke Mizutani, François Pellegrini, Fabrizio Petrini, Frances Rosamond, Ilya Safro, Sebastian Schlag, Christian Schulz, Roohani Sharma , et al. (4 additional authors not shown)

    Abstract: Large networks are useful in a wide range of applications. Sometimes problem instances are composed of billions of entities. Decomposing and analyzing these structures helps us gain new insights about our surroundings. Even if the final application concerns a different problem (such as traversal, finding paths, trees, and flows), decomposing large graphs is often an important subproblem for comple… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  4. arXiv:2310.07858  [pdf, other

    quant-ph cs.LG

    QArchSearch: A Scalable Quantum Architecture Search Package

    Authors: Ankit Kulshrestha, Danylo Lykov, Ilya Safro, Yuri Alexeev

    Abstract: The current era of quantum computing has yielded several algorithms that promise high computational efficiency. While the algorithms are sound in theory and can provide potentially exponential speedup, there is little guidance on how to design proper quantum circuits to realize the appropriate unitary transformation to be applied to the input quantum state. In this paper, we present \texttt{QArchS… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Journal ref: Published in Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2023

  5. arXiv:2307.07577  [pdf, ps, other

    cs.DS

    Decomposition Based Refinement for the Network Interdiction Problem

    Authors: Krish Matta, Xiaoyuan Liu, Ilya Safro

    Abstract: The shortest path network interdiction (SPNI) problem poses significant computational challenges due to its NP-hardness. Current solutions, primarily based on integer programming methods, are inefficient for large-scale instances. In this paper, we introduce a novel hybrid algorithm that can utilize Ising Processing Units (IPUs) alongside classical solvers. This approach decomposes the problem int… ▽ More

    Submitted 14 September, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

  6. arXiv:2306.02588  [pdf

    cs.AI

    Literature-based Discovery for Landscape Planning

    Authors: David Marasco, Ilya Tyagin, Justin Sybrandt, James H. Spencer, Ilya Safro

    Abstract: This project demonstrates how medical corpus hypothesis generation, a knowledge discovery field of AI, can be used to derive new research angles for landscape and urban planners. The hypothesis generation approach herein consists of a combination of deep learning with topic modeling, a probabilistic approach to natural language analysis that scans aggregated research databases for words that can b… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  7. arXiv:2304.07442  [pdf, other

    quant-ph cs.LG

    Learning To Optimize Quantum Neural Network Without Gradients

    Authors: Ankit Kulshrestha, Xiaoyuan Liu, Hayato Ushijima-Mwesigwa, Ilya Safro

    Abstract: Quantum Machine Learning is an emerging sub-field in machine learning where one of the goals is to perform pattern recognition tasks by encoding data into quantum states. This extension from classical to quantum domain has been made possible due to the development of hybrid quantum-classical algorithms that allow a parameterized quantum circuit to be optimized using gradient based algorithms that… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

  8. arXiv:2210.10662  [pdf, other

    cs.LG cs.DS

    Towards Practical Explainability with Cluster Descriptors

    Authors: Xiaoyuan Liu, Ilya Tyagin, Hayato Ushijima-Mwesigwa, Indradeep Ghosh, Ilya Safro

    Abstract: With the rapid development of machine learning, improving its explainability has become a crucial research goal. We study the problem of making the clusters more explainable by investigating the cluster descriptors. Given a set of objects $S$, a clustering of these objects $π$, and a set of tags $T$ that have not participated in the clustering algorithm. Each object in $S$ is associated with a sub… ▽ More

    Submitted 20 October, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

  9. arXiv:2209.02895  [pdf, other

    quant-ph cs.DS

    Constructing Optimal Contraction Trees for Tensor Network Quantum Circuit Simulation

    Authors: Cameron Ibrahim, Danylo Lykov, Zichang He, Yuri Alexeev, Ilya Safro

    Abstract: One of the key problems in tensor network based quantum circuit simulation is the construction of a contraction tree which minimizes the cost of the simulation, where the cost can be expressed in the number of operations as a proxy for the simulation running time. This same problem arises in a variety of application areas, such as combinatorial scientific computing, marginalization in probabilisti… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

    Comments: IEEE HPEC 2022 submission, 5 figures, 7 pages

  10. Quantum Approximate Optimization Algorithm with Sparsified Phase Operator

    Authors: Xiaoyuan Liu, Ruslan Shaydulin, Ilya Safro

    Abstract: The Quantum Approximate Optimization Algorithm (QAOA) is a promising candidate algorithm for demonstrating quantum advantage in optimization using near-term quantum computers. However, QAOA has high requirements on gate fidelity due to the need to encode the objective function in the phase separating operator, requiring a large number of gates that potentially do not match the hardware connectivit… ▽ More

    Submitted 29 April, 2022; originally announced May 2022.

    Journal ref: 2022 IEEE International Conference on Quantum Computing and Engineering (QCE), Broomfield, CO, USA, 2022 pp. 133-141

  11. arXiv:2204.13751  [pdf, other

    quant-ph cs.LG

    BEINIT: Avoiding Barren Plateaus in Variational Quantum Algorithms

    Authors: Ankit Kulshrestha, Ilya Safro

    Abstract: Barren plateaus are a notorious problem in the optimization of variational quantum algorithms and pose a critical obstacle in the quest for more efficient quantum machine learning algorithms. Many potential reasons for barren plateaus have been identified but few solutions have been proposed to avoid them in practice. Existing solutions are mainly focused on the initialization of unitary gate para… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

  12. Partitioning Dense Graphs with Hardware Accelerators

    Authors: Xiaoyuan Liu, Hayato Ushijima-Mwesigwa, Indradeep Ghosh, Ilya Safro

    Abstract: Graph partitioning is a fundamental combinatorial optimization problem that attracts a lot of attention from theoreticians and practitioners due to its broad applications. From multilevel graph partitioning to more general-purpose optimization solvers such as Gurobi and CPLEX, a wide range of approaches have been developed. Limitations of these approaches are important to study in order to break t… ▽ More

    Submitted 21 February, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

  13. arXiv:2201.06592  [pdf, other

    cs.IR

    Proactive Query Expansion for Streaming Data Using External Source

    Authors: Farah Alshanik, Amy Apon, Yuheng Du, Alexander Herzog, Ilya Safro

    Abstract: Query expansion is the process of reformulating the original query by adding relevant words. Choosing which terms to add in order to improve the performance of the query expansion methods or to enhance the quality of the retrieved results is an important aspect of any information retrieval system. Adding words that can positively impact the quality of the search query or are informative enough pla… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

  14. arXiv:2111.08878  [pdf, other

    cs.LG cs.CY

    CONFAIR: Configurable and Interpretable Algorithmic Fairness

    Authors: Ankit Kulshrestha, Ilya Safro

    Abstract: The rapid growth of data in the recent years has led to the development of complex learning algorithms that are often used to make decisions in real world. While the positive impact of the algorithms has been tremendous, there is a need to mitigate any bias arising from either training samples or implicit assumptions made about the data samples. This need becomes critical when algorithms are used… ▽ More

    Submitted 29 December, 2021; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: Updated Algorithm name, removed redundant mentions

  15. arXiv:2102.10750  [pdf, other

    cs.LG cs.AI

    Co** with Mistreatment in Fair Algorithms

    Authors: Ankit Kulshrestha, Ilya Safro

    Abstract: Machine learning actively impacts our everyday life in almost all endeavors and domains such as healthcare, finance, and energy. As our dependence on the machine learning increases, it is inevitable that these algorithms will be used to make decisions that will have a direct impact on the society spanning all resolutions from personal choices to world-wide policies. Hence, it is crucial to ensure… ▽ More

    Submitted 21 February, 2021; originally announced February 2021.

  16. arXiv:2102.07631  [pdf, other

    cs.IR cs.LG

    Accelerating COVID-19 research with graph mining and transformer-based learning

    Authors: Ilya Tyagin, Ankit Kulshrestha, Justin Sybrandt, Krish Matta, Michael Shtutman, Ilya Safro

    Abstract: In 2020, the White House released the, "Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset," wherein artificial intelligence experts are asked to collect data and develop text mining techniques that can help the science community answer high-priority scientific questions related to COVID-19. The Allen Institute for AI and collaborators announced the availability of a rap… ▽ More

    Submitted 29 September, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

  17. Layer VQE: A Variational Approach for Combinatorial Optimization on Noisy Quantum Computers

    Authors: Xiaoyuan Liu, Anthony Angone, Ruslan Shaydulin, Ilya Safro, Yuri Alexeev, Lukasz Cincio

    Abstract: Combinatorial optimization on near-term quantum devices is a promising path to demonstrating quantum advantage. However, the capabilities of these devices are constrained by high noise or error rates. In this paper, we propose an iterative Layer VQE (L-VQE) approach, inspired by the Variational Quantum Eigensolver (VQE). We present a large-scale numerical study, simulating circuits with up to 40 q… ▽ More

    Submitted 11 May, 2022; v1 submitted 10 February, 2021; originally announced February 2021.

    Report number: LA-UR-21-20623

    Journal ref: IEEE Transactions on Quantum Engineering 3 (2022): 1-20

  18. Classical symmetries and the Quantum Approximate Optimization Algorithm

    Authors: Ruslan Shaydulin, Stuart Hadfield, Tad Hogg, Ilya Safro

    Abstract: We study the relationship between the Quantum Approximate Optimization Algorithm (QAOA) and the underlying symmetries of the objective function to be optimized. Our approach formalizes the connection between quantum symmetry properties of the QAOA dynamics and the group of classical symmetries of the objective function. The connection is general and includes but is not limited to problems defined… ▽ More

    Submitted 27 October, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Journal ref: Quantum Inf Process 20, 359 (2021)

  19. arXiv:2012.02294  [pdf, other

    cs.IR cs.LG

    Accelerating Text Mining Using Domain-Specific Stop Word Lists

    Authors: Farah Alshanik, Amy Apon, Alexander Herzog, Ilya Safro, Justin Sybrandt

    Abstract: Text preprocessing is an essential step in text mining. Removing words that can negatively impact the quality of prediction algorithms or are not informative enough is a crucial storage-saving technique in text indexing and results in improved computational efficiency. Typically, a generic stop word list is applied to a dataset regardless of the domain. However, many common words are different fro… ▽ More

    Submitted 18 November, 2020; originally announced December 2020.

  20. arXiv:2011.02592  [pdf, other

    cs.LG

    AML-SVM: Adaptive Multilevel Learning with Support Vector Machines

    Authors: Ehsan Sadrfaridpour, Korey Palmer, Ilya Safro

    Abstract: The support vector machines (SVM) is one of the most widely used and practical optimization based classification models in machine learning because of its interpretability and flexibility to produce high quality results. However, the big data imposes a certain difficulty to the most sophisticated but relatively slow versions of SVM, namely, the nonlinear SVM. The complexity of nonlinear SVM solver… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: 10 pages, 5 tables, 3 figures, IEEE BigData 2020

  21. arXiv:2003.08420  [pdf, other

    cs.LG cs.IT stat.ML

    Unsupervised Hierarchical Graph Representation Learning by Mutual Information Maximization

    Authors: Fei Ding, Xiaohong Zhang, Justin Sybrandt, Ilya Safro

    Abstract: Graph representation learning based on graph neural networks (GNNs) can greatly improve the performance of downstream tasks, such as node and graph classification. However, the general GNN models do not aggregate node information in a hierarchical manner, and can miss key higher-order structural features of many graphs. The hierarchical aggregation also enables the graph representations to be expl… ▽ More

    Submitted 28 July, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: 7 pages, 2 figures, 4 tables

    Journal ref: the 16th International Workshop on Mining and Learning with Graphs (MLG 2020)

  22. arXiv:2003.00736  [pdf, other

    cs.DS cs.SI

    Recent Advances in Scalable Network Generation

    Authors: Manuel Penschuck, Ulrik Brandes, Michael Hamann, Sebastian Lamm, Ulrich Meyer, Ilya Safro, Peter Sanders, Christian Schulz

    Abstract: Random graph models are frequently used as a controllable and versatile data source for experimental campaigns in various research fields. Generating such data-sets at scale is a non-trivial task as it requires design decisions typically spanning multiple areas of expertise. Challenges begin with the identification of relevant domain-specific network features, continue with the question of how to… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

  23. CBAG: Conditional Biomedical Abstract Generation

    Authors: Justin Sybrandt, Ilya Safro

    Abstract: Biomedical research papers use significantly different language and jargon when compared to typical English text, which reduces the utility of pre-trained NLP models in this domain. Meanwhile Medline, a database of biomedical abstracts, introduces nearly a million new documents per-year. Applications that could benefit from understanding this wealth of publicly available information, such as scien… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

  24. arXiv:2002.05635  [pdf, other

    cs.LG stat.ML

    AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach

    Authors: Justin Sybrandt, Ilya Tyagin, Michael Shtutman, Ilya Safro

    Abstract: Medical research is risky and expensive. Drug discovery, as an example, requires that researchers efficiently winnow thousands of potential targets to a small candidate set for more thorough evaluation. However, research groups spend significant time and money to perform the experiments necessary to determine this candidate set long before seeing intermediate results. Hypothesis generation systems… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

  25. arXiv:1911.09810  [pdf, other

    cs.DS

    Leveraging Special-Purpose Hardware for Local Search Heuristics

    Authors: Xiaoyuan Liu, Hayato Ushijima-Mwesigwa, Avradip Mandal, Sarvagya Upadhyay, Ilya Safro, Arnab Roy

    Abstract: As we approach the physical limits predicted by Moore's law, a variety of specialized hardware is emerging to tackle specialized tasks in different domains. Within combinatorial optimization, adiabatic quantum computers, CMOS annealers, and optical parametric oscillators are few of the emerging specialized hardware technology aimed at solving optimization problems. In terms of mathematical framewo… ▽ More

    Submitted 28 November, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

    MSC Class: 68R05 90C27 90C59

  26. ELRUNA: Elimination Rule-based Network Alignment

    Authors: Zirou Qiu, Ruslan Shaydulin, Xiaoyuan Liu, Yuri Alexeev, Christopher S. Henry, Ilya Safro

    Abstract: Networks model a variety of complex phenomena across different domains. In many applications, one of the most essential tasks is to align two or more networks to infer the similarities between cross-network vertices and discover potential node-level correspondence. In this paper, we propose ELRUNA (Elimination rule-based network alignment), a novel network alignment algorithm that relies exclusive… ▽ More

    Submitted 23 February, 2021; v1 submitted 29 October, 2019; originally announced November 2019.

    Journal ref: ACM J. Exp. Algorithmics 26, 1, Article 1.7 (2021)

  27. arXiv:1910.09985  [pdf, other

    quant-ph cs.DS physics.comp-ph

    Multilevel Combinatorial Optimization Across Quantum Architectures

    Authors: Hayato Ushijima-Mwesigwa, Ruslan Shaydulin, Christian F. A. Negre, Susan M. Mniszewski, Yuri Alexeev, Ilya Safro

    Abstract: Emerging quantum processors provide an opportunity to explore new approaches for solving traditional problems in the post Moore's law supercomputing era. However, the limited number of qubits makes it infeasible to tackle massive real-world datasets directly in the near future, leading to new challenges in utilizing these quantum processors for practical purposes. Hybrid quantum-classical algorith… ▽ More

    Submitted 22 September, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Report number: LA-UR-19-30113

    Journal ref: ACM Transactions on Quantum Computing 2, 1, Article 1 (March 2021)

  28. Hypergraph Partitioning With Embeddings

    Authors: Justin Sybrandt, Ruslan Shaydulin, Ilya Safro

    Abstract: Problems in scientific computing, such as distributing large sparse matrix operations, have analogous formulations as hypergraph partitioning problems. A hypergraph is a generalization of a traditional graph wherein "hyperedges" may connect any number of nodes. As a result, hypergraph partitioning is an NP-Hard problem to both solve or approximate. State-of-the-art algorithms that solve this probl… ▽ More

    Submitted 25 August, 2020; v1 submitted 9 September, 2019; originally announced September 2019.

    Journal ref: IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 6, pp. 2771-2782, 1 June 2022

  29. arXiv:1905.10953  [pdf, other

    cs.LG cs.SI stat.ML

    FOBE and HOBE: First- and High-Order Bipartite Embeddings

    Authors: Justin Sybrandt, Ilya Safro

    Abstract: Typical graph embeddings may not capture type-specific bipartite graph features that arise in such areas as recommender systems, data visualization, and drug discovery. Machine learning methods utilized in these applications would be better served with specialized embedding techniques. We propose two embeddings for bipartite graphs that decompose edges into sets of indirect relationships between n… ▽ More

    Submitted 22 July, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

  30. Centralities for Networks with Consumable Resources

    Authors: Hayato Ushijima-Mwesigwa, Zadid Khan, Mashrur A. Chowdhury, Ilya Safro

    Abstract: Identification of influential nodes is an important step in understanding and controlling the dynamics of information, traffic and spreading processes in networks. As a result, a number of centrality measures have been proposed and used across different application domains. At the heart of many of these measures, lies an assumption describing the manner in which traffic (of information, social act… ▽ More

    Submitted 2 March, 2019; originally announced March 2019.

    Journal ref: Net Sci 7 (2019) 376-401

  31. arXiv:1902.08029  [pdf, other

    physics.comp-ph cs.DC

    Multilevel Graph Partitioning for Three-Dimensional Discrete Fracture Network Flow Simulations

    Authors: Hayato Ushijima-Mwesigwa, Jeffrey D. Hyman, Aric Hagberg, Ilya Safro, Satish Karra, Carl W. Gable, Matthew R. Sweeney, Gowri Srinivasan

    Abstract: We present a topology-based method for mesh-partitioning in three-dimensional discrete fracture network (DFN) simulations that take advantage of the intrinsic multi-level nature of a DFN. DFN models are used to simulate flow and transport through low-permeability fractured media in the subsurface by explicitly representing fractures as discrete entities. The governing equations for flow and transp… ▽ More

    Submitted 1 April, 2021; v1 submitted 18 February, 2019; originally announced February 2019.

  32. arXiv:1810.07765  [pdf, other

    quant-ph cs.OH

    Community Detection Across Emerging Quantum Architectures

    Authors: Ruslan Shaydulin, Hayato Ushijima-Mwesigwa, Ilya Safro, Susan Mniszewski, Yuri Alexeev

    Abstract: One of the roadmap plans for quantum computers is an integration within HPC ecosystems assigning them a role of accelerators for a variety of computationally hard tasks. However, in the near term, quantum hardware will be in a constant state of change. Heading towards solving real-world problems, we advocate development of portable, architecture-agnostic hybrid quantum-classical frameworks and dem… ▽ More

    Submitted 1 October, 2018; originally announced October 2018.

  33. arXiv:1808.06241  [pdf, other

    stat.AP cs.SI physics.soc-ph

    Spatio-temporal prediction of crimes using network analytic approach

    Authors: Saroj Kumar Dash, Ilya Safro, Ravisutha Sakrepatna Srinivasamurthy

    Abstract: It is quite evident that majority of the population lives in urban area today than in any time of the human history. This trend seems to increase in coming years. A study [5] says that nearly 80.7% of total population in USA stays in urban area. By 2030 nearly 60% of the population in the world will live in or move to cities. With the increase in urban population, it is important to keep an eye on… ▽ More

    Submitted 30 October, 2018; v1 submitted 19 August, 2018; originally announced August 2018.

  34. arXiv:1804.05942  [pdf, other

    cs.IR cs.DL

    Are Abstracts Enough for Hypothesis Generation?

    Authors: Justin Sybrandt, Angelo Carrabba, Alexander Herzog, Ilya Safro

    Abstract: The potential for automatic hypothesis generation (HG) systems to improve research productivity keeps pace with the growing set of publicly available scientific information. But as data becomes easier to acquire, we must understand the effect different textual data sources have on our resulting hypotheses. Are abstracts enough for HG, or does it need full-text papers? How many papers does an HG sy… ▽ More

    Submitted 20 October, 2018; v1 submitted 13 April, 2018; originally announced April 2018.

  35. arXiv:1802.09617  [pdf, other

    cs.SI cs.DS

    Multiscale Planar Graph Generation

    Authors: Varsha Chauhan, Alexander Gutfraind, Ilya Safro

    Abstract: The study of network representations of physical, biological, and social phenomena can help us better understand the structural and functional dynamics of their networks and formulate predictive models of these phenomena. However, due to the scarcity of real-world network data owing to factors such as cost and effort required in collection of network data and the sensitivity of this data towards t… ▽ More

    Submitted 12 May, 2019; v1 submitted 26 February, 2018; originally announced February 2018.

  36. Aggregative Coarsening for Multilevel Hypergraph Partitioning

    Authors: Ruslan Shaydulin, Ilya Safro

    Abstract: Algorithms for many hypergraph problems, including partitioning, utilize multilevel frameworks to achieve a good trade-off between the performance and the quality of results. In this paper we introduce two novel aggregative coarsening schemes and incorporate them within state-of-the-art hypergraph partitioner Zoltan. Our coarsening schemes are inspired by the algebraic multigrid and stable matchin… ▽ More

    Submitted 9 April, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

    ACM Class: G.2.2; G.1.6; I.2.8

    Journal ref: 17th International Symposium on Experimental Algorithms (SEA 2018) 2018, vol. 103, pp. 2:1-2:15

  37. arXiv:1802.03793  [pdf, other

    cs.IR cs.CL

    Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking

    Authors: Justin Sybrandt, Michael Shtutman, Ilya Safro

    Abstract: The first step of many research projects is to define and rank a short list of candidates for study. In the modern rapidity of scientific progress, some turn to automated hypothesis generation (HG) systems to aid this process. These systems can identify implicit or overlooked connections within a large scientific corpus, and while their importance grows alongside the pace of science, they lack tho… ▽ More

    Submitted 5 December, 2018; v1 submitted 11 February, 2018; originally announced February 2018.

  38. Relaxation-Based Coarsening for Multilevel Hypergraph Partitioning

    Authors: Ruslan Shaydulin, Jie Chen, Ilya Safro

    Abstract: Multilevel partitioning methods that are inspired by principles of multiscaling are the most powerful practical hypergraph partitioning solvers. Hypergraph partitioning has many applications in disciplines ranging from scientific computing to data science. In this paper we introduce the concept of algebraic distance on hypergraphs and demonstrate its use as an algorithmic component in the coarseni… ▽ More

    Submitted 8 February, 2019; v1 submitted 17 October, 2017; originally announced October 2017.

    Journal ref: Multiscale Modeling & Simulation 2019 17:1, 482-506

  39. arXiv:1708.07534  [pdf

    stat.AP cs.SI physics.soc-ph

    Detecting and monitoring foodborne illness outbreaks: Twitter communications and the 2015 U.S. Salmonella outbreak linked to imported cucumbers

    Authors: Yuliya V. Bolotova, Jie Lou, Ilya Safro

    Abstract: This research uses Twitter, as a social media device, to track communications related to the 2015 U.S. foodborne illness outbreak linked to Salmonella in imported cucumbers from Mexico. The relevant Twitter data are analyzed in light of the timeline of the official announcements made by the Centers for Disease Control and Prevention (CDC). The largest number of registered tweets is associated with… ▽ More

    Submitted 24 August, 2017; originally announced August 2017.

  40. arXiv:1708.07526  [pdf

    cs.NI

    Utility Maximization Framework for Opportunistic Wireless Electric Vehicle Charging

    Authors: MD Zadid Khan, Mashrur Chowdhury, Sakib Mahmud Khan, Ilya Safro, Hayato Ushijima-Mwesigwa

    Abstract: This is an extended abstract, it has no separate abstract section

    Submitted 6 December, 2017; v1 submitted 22 August, 2017; originally announced August 2017.

    Comments: 5 pages, 1 figure, accepted for presentation in 2018 Annual Transportation Research Board (TRB) Conference and will be included in the TRB AMOnline proceeding

  41. arXiv:1707.07657  [pdf, other

    cs.LG cs.DS stat.CO stat.ML

    Engineering fast multilevel support vector machines

    Authors: E. Sadrfaridpour, T. Razzaghi, I. Safro

    Abstract: The computational complexity of solving nonlinear support vector machine (SVM) is prohibitive on large-scale data. In particular, this issue becomes very sensitive when the data represents additional difficulties such as highly imbalanced class sizes. Typically, nonlinear kernels produce significantly higher classification quality to linear kernels but introduce extra kernel and model parameters w… ▽ More

    Submitted 5 April, 2019; v1 submitted 24 July, 2017; originally announced July 2017.

    Comments: 41 pages, 7 figures

  42. arXiv:1702.06176  [pdf, other

    cs.IR cs.DL cs.SI q-bio.QM stat.OT

    MOLIERE: Automatic Biomedical Hypothesis Generation System

    Authors: Justin Sybrandt, Michael Shtutman, Ilya Safro

    Abstract: Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. Typically, these systems operate on domain-specific fractions of public medical data. MOLIERE, in contrast, utilizes information from over 24.5 million documents. At the heart of our approach lies a multi-modal and multi-relation… ▽ More

    Submitted 31 May, 2017; v1 submitted 20 February, 2017; originally announced February 2017.

    ACM Class: H.2.8; J.3; H.3; I.5.4

  43. arXiv:1611.05487  [pdf, ps, other

    stat.ML cs.DS cs.LG stat.CO

    Algebraic multigrid support vector machines

    Authors: Ehsan Sadrfaridpour, Sandeep Jeereddy, Ken Kennedy, Andre Luckow, Talayeh Razzaghi, Ilya Safro

    Abstract: The support vector machine is a flexible optimization-based technique widely used for classification problems. In practice, its training part becomes computationally expensive on large-scale data sets because of such reasons as the complexity and number of iterations in parameter fitting methods, underlying optimization solvers, and nonlinearity of kernels. We introduce a fast multilevel framework… ▽ More

    Submitted 23 November, 2016; v1 submitted 16 November, 2016; originally announced November 2016.

  44. arXiv:1610.07703  [pdf, other

    cs.IR stat.ML

    Scalable Dynamic Topic Modeling with Clustered Latent Dirichlet Allocation (CLDA)

    Authors: Chris Gropp, Alexander Herzog, Ilya Safro, Paul W. Wilson, Amy W. Apon

    Abstract: Topic modeling, a method for extracting the underlying themes from a collection of documents, is an increasingly important component of the design of intelligent systems enabling the sense-making of highly dynamic and diverse streams of text data. Traditional methods such as Dynamic Topic Modeling (DTM) do not lend themselves well to direct parallelization because of dependencies from one time ste… ▽ More

    Submitted 4 October, 2019; v1 submitted 24 October, 2016; originally announced October 2016.

  45. arXiv:1610.06431  [pdf, other

    cs.SI cs.IR

    Detecting and Summarizing Emergent Events in Microblogs and Social Media Streams by Dynamic Centralities

    Authors: Neela Avudaiappan, Alexander Herzog, Sneha Kadam, Yuheng Du, Jason Thatcher, Ilya Safro

    Abstract: Methods for detecting and summarizing emergent keywords have been extensively studied since social media and microblogging activities have started to play an important role in data analysis and decision making. We present a system for monitoring emergent keywords and summarizing a document stream based on the dynamic semantic graphs of streaming documents. We introduce the notion of dynamic eigenv… ▽ More

    Submitted 20 October, 2016; originally announced October 2016.

  46. arXiv:1609.02121  [pdf, other

    cs.SI cs.DS physics.soc-ph

    Generating realistic scaled complex networks

    Authors: Christian L. Staudt, Michael Hamann, Alexander Gutfraind, Ilya Safro, Henning Meyerhenke

    Abstract: Research on generative models is a central project in the emerging field of network science, and it studies how statistical patterns found in real networks could be generated by formal rules. Output from these generative models is then the basis for designing and evaluating computational methods on networks, and for verification and simulation studies. During the last two decades, a variety of mod… ▽ More

    Submitted 23 March, 2017; v1 submitted 7 September, 2016; originally announced September 2016.

    Comments: 26 pages, 13 figures, extended version, a preliminary version of the paper was presented at the 5th International Workshop on Complex Networks and their Applications

  47. arXiv:1604.02123  [pdf, other

    stat.ML cs.LG stat.AP

    Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

    Authors: Talayeh Razzaghi, Oleg Roderick, Ilya Safro, Nicholas Marko

    Abstract: This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techn… ▽ More

    Submitted 7 April, 2016; originally announced April 2016.

    Comments: arXiv admin note: substantial text overlap with arXiv:1503.06250

  48. arXiv:1601.05527  [pdf, other

    cs.SI cs.DC cs.DS

    Single- and Multi-level Network Sparsification by Algebraic Distance

    Authors: Emmanuel John, Ilya Safro

    Abstract: Network sparsification methods play an important role in modern network analysis when fast estimation of computationally expensive properties (such as the diameter, centrality indices, and paths) is required. We propose a method of network sparsification that preserves a wide range of structural properties. Depending on the analysis goals, the method allows to distinguish between local and global… ▽ More

    Submitted 21 January, 2016; originally announced January 2016.

  49. arXiv:1503.06250  [pdf, other

    stat.ML cs.LG

    Fast Imbalanced Classification of Healthcare Data with Missing Values

    Authors: Talayeh Razzaghi, Oleg Roderick, Ilya Safro, Nick Marko

    Abstract: In medical domain, data features often contain missing values. This can create serious bias in the predictive modeling. Typical standard data mining methods often produce poor performance measures. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. The proposed method is based on a multilevel framework of the cost-sensitive SV… ▽ More

    Submitted 20 March, 2015; originally announced March 2015.

  50. arXiv:1410.4885  [pdf, ps, other

    cs.DS cs.DM math.CO

    A Multilevel Bilinear Programming Algorithm For the Vertex Separator Problem

    Authors: William W. Hager, James T. Hungerford, Ilya Safro

    Abstract: The Vertex Separator Problem for a graph is to find the smallest collection of vertices whose removal breaks the graph into two disconnected subsets that satisfy specified size constraints. In the paper 10.1016/j.ejor.2014.05.042, the Vertex Separator Problem was formulated as a continuous (non-concave/non-convex) bilinear quadratic program. In this paper, we develop a more general continuous bili… ▽ More

    Submitted 17 July, 2016; v1 submitted 17 October, 2014; originally announced October 2014.