Skip to main content

Showing 1–9 of 9 results for author: Sinaei, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.05363  [pdf, other

    cs.LG q-bio.QM

    Beyond the training set: an intuitive method for detecting distribution shift in model-based optimization

    Authors: Farhan Damani, David H Brookes, Theodore Sternlieb, Cameron Webster, Stephen Malina, Rishi Jajoo, Kathy Lin, Sam Sinai

    Abstract: Model-based optimization (MBO) is increasingly applied to design problems in science and engineering. A common scenario involves using a fixed training set to train models, with the goal of designing new samples that outperform those present in the training data. A major challenge in this setting is distribution shift, where the distributions of training and design samples are different. While som… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  2. arXiv:2305.03136  [pdf, other

    q-bio.PE cs.LG

    Contrastive losses as generalized models of global epistasis

    Authors: David H. Brookes, Jakub Otwinowski, Sam Sinai

    Abstract: Fitness functions map large combinatorial spaces of biological sequences to properties of interest. Inferring these multimodal functions from experimental data is a central task in modern protein engineering. Global epistasis models are an effective and physically-grounded class of models for estimating fitness functions from observed data. These models assume that a sparse latent function is tran… ▽ More

    Submitted 1 December, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

  3. arXiv:2211.10422  [pdf, other

    q-bio.QM cs.LG math.OC q-bio.BM

    Forecasting labels under distribution-shift for machine-guided sequence design

    Authors: Lauren Berk Wheelock, Stephen Malina, Jeffrey Gerold, Sam Sinai

    Abstract: The ability to design and optimize biological sequences with specific functionalities would unlock enormous value in technology and healthcare. In recent years, machine learning-guided sequence design has progressed this goal significantly, though validating designed sequences in the lab or clinic takes many months and substantial labor. It is therefore valuable to assess the likelihood that a des… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

    Comments: 15 pages, 3 figures, to appear in MLCB-PMLR proceedings, oral presentation at MLCB 2022 and LMLR 2022

  4. arXiv:2110.01221  [pdf, other

    cs.LG

    DenDrift: A Drift-Aware Algorithm for Host Profiling

    Authors: Ali Sedaghatbaf, Sima Sinaei, Perttu Ranta-aho, Marko Koskinen

    Abstract: Detecting and reacting to unauthorized actions is an essential task in security monitoring. What make this task challenging are the large number and various categories of hosts and processes to monitor. To these we should add the lack of an exact definition of normal behavior for each category. Host profiling using stream clustering algorithms is an effective means of analyzing hosts' behaviors, c… ▽ More

    Submitted 4 October, 2021; originally announced October 2021.

  5. arXiv:2010.10614  [pdf, other

    q-bio.QM cs.LG q-bio.BM q-bio.PE

    A primer on model-guided exploration of fitness landscapes for biological sequence design

    Authors: Sam Sinai, Eric D Kelsic

    Abstract: Machine learning methods are increasingly employed to address challenges faced by biologists. One area that will greatly benefit from this cross-pollination is the problem of biological sequence design, which has massive potential for therapeutic applications. However, significant inefficiencies remain in communication between these fields which result in biologists finding the progress in machine… ▽ More

    Submitted 23 October, 2020; v1 submitted 4 October, 2020; originally announced October 2020.

  6. arXiv:2010.02141  [pdf, other

    cs.LG math.OC q-bio.BM q-bio.QM

    AdaLead: A simple and robust adaptive greedy search algorithm for sequence design

    Authors: Sam Sinai, Richard Wang, Alexander Whatley, Stewart Slocum, Elina Locane, Eric D. Kelsic

    Abstract: Efficient design of biological sequences will have a great impact across many industrial and healthcare domains. However, discovering improved sequences requires solving a difficult optimization problem. Traditionally, this challenge was approached by biologists through a model-free method known as "directed evolution", the iterative process of random mutation and selection. As the ability to buil… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

  7. arXiv:2004.11206  [pdf

    cs.LG cs.DC eess.SP

    Multi-level Binarized LSTM in EEG Classification for Wearable Devices

    Authors: Najmeh Nazari, Seyed Ahmad Mirsalari, Sima Sinaei, Mostafa E. Salehi, Masoud Daneshtalab

    Abstract: Long Short-Term Memory (LSTM) is widely used in various sequential applications. Complex LSTMs could be hardly deployed on wearable and resourced-limited devices due to the huge amount of computations and memory requirements. Binary LSTMs are introduced to cope with this problem, however, they lead to significant accuracy loss in some application such as EEG classification which is essential to be… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

    Comments: o appear in IEEE International Conference on Parallel, Distributed and Network-based Processing in 2020. arXiv admin note: text overlap with arXiv:1812.04818 by other authors

    MSC Class: 03B05 ACM Class: I.2.6; B.8.2

  8. arXiv:2004.08914  [pdf

    eess.SP cs.LG cs.NE

    MuBiNN: Multi-Level Binarized Recurrent Neural Network for EEG signal Classification

    Authors: Seyed Ahmad Mirsalari, Sima Sinaei, Mostafa E. Salehi, Masoud Daneshtalab

    Abstract: Recurrent Neural Networks (RNN) are widely used for learning sequences in applications such as EEG classification. Complex RNNs could be hardly deployed on wearable devices due to their computation and memory-intensive processing patterns. Generally, reduction in precision leads much more efficiency and binarized RNNs are introduced as energy-efficient solutions. However, naive binarization method… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

    Comments: To appear in IEEE International Symposium on Circuits & Systems in 2020. arXiv admin note: text overlap with arXiv:1807.04093 by other authors

    MSC Class: 03B05 ACM Class: I.2.6; B.8.2

  9. arXiv:1712.03346  [pdf, other

    q-bio.QM cs.LG

    Variational auto-encoding of protein sequences

    Authors: Sam Sinai, Eric Kelsic, George M. Church, Martin A. Nowak

    Abstract: Proteins are responsible for the most diverse set of functions in biology. The ability to extract information from protein sequences and to predict the effects of mutations is extremely valuable in many domains of biology and medicine. However the map** between protein sequence and function is complex and poorly understood. Here we present an embedding of natural protein sequences using a Variat… ▽ More

    Submitted 3 January, 2018; v1 submitted 9 December, 2017; originally announced December 2017.

    Comments: Abstract for oral presentation at NIPS 2017 Workshop on Machine Learning in Computational Biology