Skip to main content

Showing 1–27 of 27 results for author: Farnoud, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.00968  [pdf, other

    cs.LG math.OC stat.ML

    Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

    Authors: Qiwei Di, Tao **, Yue Wu, Heyang Zhao, Farzad Farnoud, Quanquan Gu

    Abstract: Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems. While substantial efforts have been made to minimize the cumulative regret in dueling bandits, a notable gap in the current research is the absence of regret b… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 28 pages, 1 figure

  2. arXiv:2303.08816  [pdf, other

    cs.LG stat.ML

    Borda Regret Minimization for Generalized Linear Dueling Bandits

    Authors: Yue Wu, Tao **, Hao Lou, Farzad Farnoud, Quanquan Gu

    Abstract: Dueling bandits are widely used to model preferential feedback prevalent in many applications such as recommendation systems and ranking. In this paper, we study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score while minimizing the cumulative regret. We propose a rich class of generalized linear dueling bandit models, which cov… ▽ More

    Submitted 25 September, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 33 pages, 5 figure. This version includes new results for dueling bandits in the adversarial setting

  3. arXiv:2210.11818  [pdf, ps, other

    cs.IT

    Non-binary Codes for Correcting a Burst of at Most t Deletions

    Authors: Shuche Wang, Yuanyuan Tang, ** Sima, Ryan Gabrys, Farzad Farnoud

    Abstract: The problem of correcting deletions has received significant attention, partly because of the prevalence of these errors in DNA data storage. In this paper, we study the problem of correcting a consecutive burst of at most $t$ deletions in non-binary sequences. We first propose a non-binary code correcting a burst of at most 2 deletions for $q$-ary alphabets. Afterwards, we extend this result to t… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: 20 pages. The paper has been submitted to IEEE Transactions on Information Theory. Furthermore, the paper was presented in part at the ISIT2021 and Allerton2022

  4. arXiv:2208.02330  [pdf, other

    cs.IT

    Low-redundancy codes for correcting multiple short-duplication and edit errors

    Authors: Yuanyuan Tang, Shuche Wang, Hao Lou, Ryan Gabrys, Farzad Farnoud

    Abstract: Due to its higher data density, longevity, energy efficiency, and ease of generating copies, DNA is considered a promising storage technology for satisfying future needs. However, a diverse set of errors including deletions, insertions, duplications, and substitutions may arise in DNA at different stages of data storage and retrieval. The current paper constructs error-correcting codes for simulta… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: 21 pages. The paper has been submitted to IEEE Transaction on Information Theory. Furthermore, the paper was presented in part at the ISIT2021 and ISIT2022

  5. arXiv:2110.04136  [pdf, other

    cs.LG

    Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons

    Authors: Yue Wu, Tao **, Hao Lou, Pan Xu, Farzad Farnoud, Quanquan Gu

    Abstract: In heterogeneous rank aggregation problems, users often exhibit various accuracy levels when comparing pairs of items. Thus a uniform querying strategy over users may not be optimal. To address this issue, we propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons from users and improves the users' average accuracy by maintaining a… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

  6. Data Deduplication with Random Substitutions

    Authors: Hao Lou, Farzad Farnoud

    Abstract: Data deduplication saves storage space by identifying and removing repeats in the data stream. Compared with traditional compression methods, data deduplication schemes are more time efficient and are thus widely used in large scale storage systems. In this paper, we provide an information-theoretic analysis on the performance of deduplication algorithms on data streams in which repeats are not ex… ▽ More

    Submitted 26 May, 2022; v1 submitted 1 July, 2021; originally announced July 2021.

  7. arXiv:2011.05896  [pdf, ps, other

    cs.IT

    Error-correcting Codes for Short Tandem Duplication and Substitution Errors

    Authors: Yuanyuan Tang, Farzad Farnoud

    Abstract: Due to its high data density and longevity, DNA is considered a promising medium for satisfying ever-increasing data storage needs. However, the diversity of errors that occur in DNA sequences makes efficient error-correction a challenging task. This paper aims to address simultaneously correcting two types of errors, namely, short tandem duplication and substitution errors. We focus on tandem rep… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: 9 pages

  8. arXiv:2008.08174  [pdf, other

    cs.IT

    Error-correcting Codes for Noisy Duplication Channels

    Authors: Yuanyuan Tang, Farzad Farnoud

    Abstract: Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this paper, we consider correcting duplication errors for both exact and noisy tande… ▽ More

    Submitted 18 August, 2020; originally announced August 2020.

    Comments: 14 pages, 2 figures, Allerton, TIT draft

  9. arXiv:2005.03248  [pdf, other

    cs.IT

    Coding for Optimized Writing Rate in DNA Storage

    Authors: Siddharth Jain, Farzad Farnoud, Moshe Schwartz, Jehoshua Bruck

    Abstract: A method for encoding information in DNA sequences is described. The method is based on the precision-resolution framework, and is aimed to work in conjunction with a recently suggested terminator-free template independent DNA synthesis method. The suggested method optimizes the amount of information bits per synthesis time unit, namely, the writing rate. Additionally, the encoding scheme studied… ▽ More

    Submitted 13 May, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: To appear in ISIT 2020

  10. arXiv:1912.01211  [pdf, other

    cs.LG stat.ML

    Rank Aggregation via Heterogeneous Thurstone Preference Models

    Authors: Tao **, Pan Xu, Quanquan Gu, Farzad Farnoud

    Abstract: We propose the Heterogeneous Thurstone Model (HTM) for aggregating ranked data, which can take the accuracy levels of different users into account. By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone's original framework, and as such, also extends the Bradley-Terry-Luce (BTL) model for pairwise comparisons to heterogeneous populations of users. U… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

    Comments: 36 pages, 2 figures, 8 tables. In AAAI 2020

  11. Single-Error Detection and Correction for Duplication and Substitution Channels

    Authors: Yuanyuan Tang, Yonatan Yehezkeally, Moshe Schwartz, Farzad Farnoud

    Abstract: Motivated by mutation processes occurring in in-vivo DNA-storage applications, a channel that mutates stored strings by duplicating substrings as well as substituting symbols is studied. Two models of such a channel are considered: one in which the substitutions occur only within the duplicated substrings, and one in which the location of substitutions is unrestricted. Both error-detecting and err… ▽ More

    Submitted 28 June, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

    Comments: Author-submitted, peer-reviewed, version

  12. arXiv:1812.02250  [pdf, other

    cs.IT

    Evolution of $k$-mer Frequencies and Entropy in Duplication and Substitution Mutation Systems

    Authors: Hao Lou, Farzad Farnoud, Moshe Schwartz, Jehoshua Bruck

    Abstract: Genomic evolution can be viewed as string-editing processes driven by mutations. An understanding of the statistical properties resulting from these mutation processes is of value in a variety of tasks related to biological sequence data, e.g., estimation of model parameters and compression. At the same time, due to the complexity of these processes, designing tractable stochastic models and analy… ▽ More

    Submitted 5 December, 2018; originally announced December 2018.

  13. arXiv:1809.04702  [pdf, other

    cs.IT

    Reconciling Similar Sets of Data

    Authors: Ryan Gabrys, Farzad Farnoud

    Abstract: In this work, we consider the problem of synchronizing two sets of data where the size of the symmetric difference between the sets is small and, in addition, the elements in the symmetric difference are related through the Hamming distance metric. Upper and lower bounds are derived on the minimum amount of information exchange. Furthermore, explicit encoding and decoding algorithms are provided f… ▽ More

    Submitted 12 September, 2018; originally announced September 2018.

  14. arXiv:1808.06062  [pdf, ps, other

    cs.IT math.CO

    The Capacity of Some Pólya String Models

    Authors: Ohad Elishco, Farzad Farnoud, Moshe Schwartz, Jehoshua Bruck

    Abstract: We study random string-duplication systems, which we call Pólya string models. These are motivated by DNA storage in living organisms, and certain random mutation processes that affect their genome. Unlike previous works that study the combinatorial capacity of string-duplication systems, or various string statistics, this work provides exact capacity or bounds on it, for several probabilistic mod… ▽ More

    Submitted 18 August, 2018; originally announced August 2018.

  15. arXiv:1611.05537  [pdf, other

    cs.IT cs.DM q-bio.GN

    Duplication Distance to the Root for Binary Sequences

    Authors: Noga Alon, Jehoshua Bruck, Farzad Farnoud, Siddharth Jain

    Abstract: We study the tandem duplication distance between binary sequences and their roots. In other words, the quantity of interest is the number of tandem duplication operations of the form $\seq x = \seq a \seq b \seq c \to \seq y = \seq a \seq b \seq b \seq c$, where $\seq x$ and $\seq y$ are sequences and $\seq a$, $\seq b$, and $\seq c$ are their substrings, needed to generate a binary sequence of le… ▽ More

    Submitted 16 November, 2016; originally announced November 2016.

    Comments: submitted to IEEE Transactions on Information Theory

  16. Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms

    Authors: Siddharth Jain, Farzad Farnoud, Moshe Schwartz, Jehoshua Bruck

    Abstract: The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected to maintain data integrity. In this paper, we prov… ▽ More

    Submitted 1 June, 2016; originally announced June 2016.

    Comments: Submitted to IEEE Transactions on Information Theory

  17. arXiv:1509.06029  [pdf, other

    cs.IT cs.DM cs.FL q-bio.GN

    Capacity and Expressiveness of Genomic Tandem Duplication

    Authors: Siddharth Jain, Farzad Farnoud, Jehoshua Bruck

    Abstract: The majority of the human genome consists of repeated sequences. An important type of repeated sequences common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence $AGTC\underline{TGTG}C$, $TGTG$ is a tandem repeat, that may be generated from $AGTCTGC$ by a tandem duplication of length $2$. In this work, we investigate the possibil… ▽ More

    Submitted 20 September, 2015; originally announced September 2015.

    Comments: 19 pages, 3 figures, submitted to IEEE Transactions on Information Theory

  18. arXiv:1401.4634  [pdf, ps, other

    cs.IT cs.CL

    The Capacity of String-Replication Systems

    Authors: Farzad Farnoud, Moshe Schwartz, Jehoshua Bruck

    Abstract: It is known that the majority of the human genome consists of repeated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from repeated sequences and has mutated to its current form. In this paper, we investigate the possibility of constructing an exponentially large number of sequences from a short initial sequence and simple replication rules… ▽ More

    Submitted 18 January, 2014; originally announced January 2014.

  19. arXiv:1401.3093  [pdf, ps, other

    cs.IT

    Rate-Distortion for Ranking with Incomplete Information

    Authors: Farzad Farnoud, Moshe Schwartz, Jehoshua Bruck

    Abstract: We study the rate-distortion relationship in the set of permutations endowed with the Kendall Tau metric and the Chebyshev metric. Our study is motivated by the application of permutation rate-distortion to the average-case and worst-case analysis of algorithms for ranking with incomplete information and approximate sorting algorithms. For the Kendall Tau metric we provide bounds for small, medium… ▽ More

    Submitted 14 January, 2014; originally announced January 2014.

  20. arXiv:1312.2163  [pdf, ps, other

    cs.IT

    Multipermutation Codes in the Ulam Metric for Nonvolatile Memories

    Authors: Farzad Farnoud, Olgica Milenkovic

    Abstract: We address the problem of multipermutation code design in the Ulam metric for novel storage applications. Multipermutation codes are suitable for flash memory where cell charges may share the same rank. Changes in the charges of cells manifest themselves as errors whose effects on the retrieved signal may be measured via the Ulam distance. As part of our analysis, we study multipermutation codes i… ▽ More

    Submitted 7 December, 2013; originally announced December 2013.

  21. arXiv:1307.4339  [pdf, other

    cs.DS cs.IT

    Computing Similarity Distances Between Rankings

    Authors: Farzad Farnoud, Lili Su, Gregory J. Puleo, Olgica Milenkovic

    Abstract: We address the problem of computing distances between rankings that take into account similarities between candidates. The need for evaluating such distances is governed by applications as diverse as rank aggregation, bioinformatics, social sciences and data storage. The problem may be summarized as follows: Given two rankings and a positive cost function on transpositions that depends on the simi… ▽ More

    Submitted 19 November, 2014; v1 submitted 16 July, 2013; originally announced July 2013.

    Comments: 32 pages, 14 figures. Corrected proof of unbalanced case

  22. arXiv:1212.2607  [pdf, ps, other

    cs.SI

    A General Framework for Distributed Vote Aggregation

    Authors: Behrouz Touri, Farzad Farnoud, Angelia Neidic, Olgica Milenkovic

    Abstract: We present a general model for opinion dynamics in a social network together with several possibilities for object selections at times when the agents are communicating. We study the limiting behavior of such a dynamics and show that this dynamics almost surely converges. We consider some special implications of the convergence result for gossip and top-$k$ selective gossip models. In particular,… ▽ More

    Submitted 11 December, 2012; originally announced December 2012.

  23. arXiv:1212.1471  [pdf, ps, other

    cs.GT

    A Novel Distance-Based Approach to Constrained Rank Aggregation

    Authors: Farzad Farnoud, Olgica Milenkovic, Behrouz Touri

    Abstract: We consider a classical problem in choice theory -- vote aggregation -- using novel distance measures between permutations that arise in several practical applications. The distance measures are derived through an axiomatic approach, taking into account various issues arising in voting with side constraints. The side constraints of interest include non-uniform relevance of the top and the bottom o… ▽ More

    Submitted 6 December, 2012; originally announced December 2012.

  24. arXiv:1206.5343  [pdf, ps, other

    cs.DS math.CO

    Nonuniform Vote Aggregation Algorithms

    Authors: Farzad Farnoud, Behrouz Touri, Olgica Milenkovic

    Abstract: We consider the problem of non-uniform vote aggregation, and in particular, the algorithmic aspects associated with the aggregation process. For a novel class of weighted distance measures on votes, we present two different aggregation methods. The first algorithm is based on approximating the weighted distance measure by Spearman's footrule distance, with provable constant approximation guarantee… ▽ More

    Submitted 22 June, 2012; originally announced June 2012.

  25. arXiv:1202.0932  [pdf, ps, other

    cs.IT

    Error-Correction in Flash Memories via Codes in the Ulam Metric

    Authors: Farzad Farnoud, Vitaly Skachek, Olgica Milenkovic

    Abstract: We consider rank modulation codes for flash memories that allow for handling arbitrary charge-drop errors. Unlike classical rank modulation codes used for correcting errors that manifest themselves as swaps of two adjacently ranked elements, the proposed \emph{translocation rank codes} account for more general forms of errors that arise in storage systems. Translocations represent a natural extens… ▽ More

    Submitted 21 April, 2013; v1 submitted 4 February, 2012; originally announced February 2012.

  26. arXiv:1202.0925  [pdf, ps, other

    cs.IT

    Alternating Markov Chains for Distribution Estimation in the Presence of Errors

    Authors: Farzad Farnoud, Narayana P. Santhanam, Olgica Milenkovic

    Abstract: We consider a class of small-sample distribution estimators over noisy channels. Our estimators are designed for repetition channels, and rely on properties of the runs of the observed sequences. These runs are modeled via a special type of Markov chains, termed alternating Markov chains. We show that alternating chains have redundancy that scales sub-linearly with the lengths of the sequences, an… ▽ More

    Submitted 4 February, 2012; originally announced February 2012.

  27. arXiv:1007.4236  [pdf, ps, other

    cs.IT

    Sorting of Permutations by Cost-Constrained Transpositions

    Authors: Farzad Farnoud, Olgica Milenkovic

    Abstract: We address the problem of finding the minimum decomposition of a permutation in terms of transpositions with non-uniform cost. For arbitrary non-negative cost functions, we describe polynomial-time, constant-approximation decomposition algorithms. For metric-path costs, we describe exact polynomial-time decomposition algorithms. Our algorithms represent a combination of Viterbi-type algorithms and… ▽ More

    Submitted 23 July, 2010; originally announced July 2010.