Skip to main content

Showing 1–19 of 19 results for author: Boucher, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.04258  [pdf, other

    eess.IV cs.DB q-bio.QM

    MAPLES-DR: MESSIDOR Anatomical and Pathological Labels for Explainable Screening of Diabetic Retinopathy

    Authors: Gabriel Lepetit-Aimon, Clément Playout, Marie Carole Boucher, Renaud Duval, Michael H Brent, Farida Cheriet

    Abstract: Reliable automatic diagnosis of Diabetic Retinopathy (DR) and Macular Edema (ME) is an invaluable asset in improving the rate of monitored patients among at-risk populations and in enabling earlier treatments before the pathology progresses and threatens vision. However, the explainability of screening models is still an open question, and specifically designed datasets are required to support the… ▽ More

    Submitted 19 January, 2024; originally announced February 2024.

    Comments: 1 pages, 7 figures

  2. arXiv:2309.01055  [pdf, other

    cs.RO cs.AI

    Integration of Vision-based Object Detection and Gras** for Articulated Manipulator in Lunar Conditions

    Authors: Camille Boucher, Gustavo H. Diaz, Shreya Santra, Kentaro Uno, Kazuya Yoshida

    Abstract: The integration of vision-based frameworks to achieve lunar robot applications faces numerous challenges such as terrain configuration or extreme lighting conditions. This paper presents a generic task pipeline using object detection, instance segmentation and grasp detection, that can be used for various applications by using the results of these vision-based systems in a different way. We achiev… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

  3. arXiv:2308.07809  [pdf, other

    cs.DS

    Another virtue of wavelet forests?

    Authors: Christina Boucher, Travis Gagie, Aaron Hong, Yansong Li, Norbert Zeh

    Abstract: A wavelet forest for a text $T [1..n]$ over an alphabet $σ$ takes $n H_0 (T) + o (n \log σ)$ bits of space and supports access and rank on $T$ in $O (\log σ)$ time. Kärkkäinen and Puglisi (2011) implicitly introduced wavelet forests and showed that when $T$ is the Burrows-Wheeler Transform (BWT) of a string $S$, then a wavelet forest for $T$ occupies space bounded in terms of higher-order empirica… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  4. arXiv:2305.05893  [pdf, other

    cs.DS

    Acceleration of FM-index Queries Through Prefix-free Parsing

    Authors: Aaron Hong, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher, Travis Gagie

    Abstract: FM-indexes are a crucial data structure in DNA alignment, for example, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  5. arXiv:2207.07458  [pdf

    stat.ML cs.LG

    Joint Application of the Target Trial Causal Framework and Machine Learning Modeling to Optimize Antibiotic Therapy: Use Case on Acute Bacterial Skin and Skin Structure Infections due to Methicillin-resistant Staphylococcus aureus

    Authors: Inyoung Jun, Simone Marini, Christina A. Boucher, J. Glenn Morris, Jiang Bian, Mattia Prosperi

    Abstract: Bacterial infections are responsible for high mortality worldwide. Antimicrobial resistance underlying the infection, and multifaceted patient's clinical status can hamper the correct choice of antibiotic treatment. Randomized clinical trials provide average treatment effect estimates but are not ideal for risk stratification and optimization of therapeutic choice, i.e., individualized treatment e… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: This is the Proceedings of the KDD workshop on Applied Data Science for Healthcare (DSHealth 2022), which was held on Washington D.C, August 14 2022

  6. arXiv:2107.03383  [pdf, other

    q-bio.GN cs.LG

    Assessing putative bias in prediction of anti-microbial resistance from real-world genoty** data under explicit causal assumptions

    Authors: Mattia Prosperi, Simone Marini, Christina Boucher, Jiang Bian

    Abstract: Whole genome sequencing (WGS) is quickly becoming the customary means for identification of antimicrobial resistance (AMR) due to its ability to obtain high resolution information about the genes and mechanisms that are causing resistance and driving pathogen mobility. By contrast, traditional phenotypic (antibiogram) testing cannot easily elucidate such information. Yet development of AMR predict… ▽ More

    Submitted 23 July, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

    Comments: In DSHealth '21] Joint KDD 2021 Health Day and 2021 KDD Workshop on Applied Data Science for Healthcare, Aug 14--18, 2021, Virtual, 5 pages

  7. arXiv:2106.11191  [pdf, other

    cs.DS

    Computing the original eBWT faster, simpler, and with less memory

    Authors: Christina Boucher, Davide Cenzato, Zsuzsanna Lipták, Massimiliano Rossi, Marinella Sciortino

    Abstract: Mantaci et al. [TCS 2007] defined the eBWT to extend the definition of the BWT to a collection of strings, however, since this introduction, it has been used more generally to describe any BWT of a collection of strings and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algo… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: 20 pages, 5 figures, 1 table

  8. arXiv:2011.05610  [pdf, ps, other

    cs.DS

    PHONI: Streamed Matching Statistics with Multi-Genome References

    Authors: Christina Boucher, Travis Gagie, Tomohiro I, Dominik Köppl, Ben Langmead, Giovanni Manzini, Gonzalo Navarro, Alejandro Pacheco, Massimiliano Rossi

    Abstract: Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this pape… ▽ More

    Submitted 11 February, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

    Comments: Our code is available at https://github.com/koeppl/phoni

  9. arXiv:2006.11687  [pdf, other

    cs.DS

    PFP Data Structures

    Authors: Christina Boucher, Ondřej Cvacho, Travis Gagie, Jan Holub, Giovanni Manzini, Gonzalo Navarro, Massimiliano Rossi

    Abstract: Prefix-free parsing (PFP) was introduced by Boucher et al. (2019) as a preprocessing step to ease the computation of Burrows-Wheeler Transforms (BWTs) of genomic databases. Given a string $S$, it produces a dictionary $D$ and a parse $P$ of overlap** phrases such that $\mathrm{BWT} (S)$ can be computed from $D$ and $P$ in time and workspace bounded in terms of their combined size… ▽ More

    Submitted 20 June, 2020; originally announced June 2020.

  10. arXiv:1912.00913  [pdf

    stat.AP cs.SE

    Automated metrics calculation in a dynamic heterogeneous environment

    Authors: Craig Boucher, Ulf Knoblich, Daniel Miller, Sasha Patotski, Amin Saied, Venky Venkateshaiah

    Abstract: A consistent theme in software experimentation at Microsoft has been solving problems of experimentation at scale for a diverse set of products. Running experiments at scale (i.e., many experiments on many users) has become state of the art across the industry. However, providing a single platform that allows software experimentation in a highly heterogenous and constantly evolving ecosystem remai… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

    Comments: 5 pages, MIT Code

  11. arXiv:1908.01263  [pdf, ps, other

    cs.DS q-bio.GN

    Matching reads to many genomes with the $r$-index

    Authors: Taher Mun, Alan Kuhnle, Christina Boucher, Travis Gagie, Ben Langmead, Giovanni Manzini

    Abstract: The $r$-index is a tool for compressed indexing of genomic databases for exact pattern matching, which can be used to completely align reads that perfectly match some part of a genome in the database or to find seeds for reads that do not. This paper shows how to download and install the programs ri-buildfasta and ri-align; how to call ri-buildfasta on a FASTA file to build an $r$-index for that f… ▽ More

    Submitted 3 August, 2019; originally announced August 2019.

  12. arXiv:1811.06933  [pdf, other

    cs.DS

    Efficient Construction of a Complete Index for Pan-Genomics Read Alignment

    Authors: Alan Kuhnle, Taher Mun, Christina Boucher, Travis Gagie, Ben Langmead, Giovanni Manzini

    Abstract: While short read aligners, which predominantly use the FM-index, are able to easily index one or a few human genomes, they do not scale well to indexing databases containing thousands of genomes. To understand why, it helps to examine the main components of the FM-index in more detail, which is a rank data structure over the Burrows-Wheeler Transform (BWT) of the string that will allow us to find… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

  13. arXiv:1803.11245  [pdf, other

    cs.DS

    Prefix-Free Parsing for Building Big BWTs

    Authors: Christina Boucher, Travis Gagie, Alan Kuhnle, Ben Langmead, Giovanni Manzini, Taher Mun

    Abstract: High-throughput sequencing technologies have led to explosive growth of genomic databases; one of which will soon reach hundreds of terabytes. For many applications we want to build and store indexes of these databases but constructing such indexes is a challenge. Fortunately, many of these genomic databases are highly-repetitive---a characteristic that can be exploited to ease the computation of… ▽ More

    Submitted 16 November, 2018; v1 submitted 29 March, 2018; originally announced March 2018.

    Comments: Preliminary version appeared at WABI '18; full version submitted to a journal

  14. arXiv:1506.03262  [pdf, other

    cs.DS

    Relative Select

    Authors: Christina Boucher, Alexander Bowe, Travis Gagie, Giovanni Manzini, Jouni Sirén

    Abstract: Motivated by the problem of storing coloured de Bruijn graphs, we show how, if we can already support fast select queries on one string, then we can store a little extra information and support fairly fast select queries on a similar string.

    Submitted 10 June, 2015; originally announced June 2015.

  15. arXiv:1411.5890  [pdf, other

    q-bio.GN cs.CE

    Misassembly Detection using Paired-End Sequence Reads and Optical Map** Data

    Authors: Martin D. Muggli, Simon J. Puglisi, Roy Ronen, Christina Boucher

    Abstract: A crucial problem in genome assembly is the discovery and correction of misassembly errors in draft genomes. We develop a method that will enhance the quality of draft genomes by identifying and removing misassembly errors using paired short read sequence data and optical map** data. We apply our method to various assemblies of the loblolly pine and Francisella tularensis genomes. Our results de… ▽ More

    Submitted 20 November, 2014; originally announced November 2014.

    Comments: 14 pages, 4 figures. Submitted to RECOMB. Preparing to submit to Genome Biology

  16. arXiv:1411.2718  [pdf, other

    cs.DS q-bio.GN q-bio.QM

    Variable-Order de Bruijn Graphs

    Authors: Christina Boucher, Alex Bowe, Travis Gagie, Simon J. Puglisi, Kunihiko Sadakane

    Abstract: The de Bruijn graph $G_K$ of a set of strings $S$ is a key data structure in genome assembly that represents overlaps between all the $K$-length substrings of $S$. Construction and navigation of the graph is a space and time bottleneck in practice and the main hurdle for assembling large, eukaryote genomes. This problem is compounded by the fact that state-of-the-art assemblers do not build the de… ▽ More

    Submitted 17 November, 2014; v1 submitted 11 November, 2014; originally announced November 2014.

    Comments: Conference submission, 10 pages, +minor corrections

  17. arXiv:1408.5592  [pdf, other

    cs.CE

    HyDA-Vista: Towards Optimal Guided Selection of k-mer Size for Sequence Assembly

    Authors: Seyed Basir Shariat Razavi, Narjes Sadat Movahedi Tabrizi, Hamidreza Chitsaz, Christina Boucher

    Abstract: Motivation: Intimately tied to assembly quality is the complexity of the de Bruijn graph built by the assembler. Thus, there have been many paradigms developed to decrease the complexity of the de Bruijn graph. One obvious combinatorial paradigm for this is to allow the value of $k$ to vary; having a larger value of $k$ where the graph is more complex and a smaller value of $k$ where the graph wou… ▽ More

    Submitted 24 August, 2014; originally announced August 2014.

    Comments: 11 pages, 1 figure, 1 table

  18. arXiv:1202.2820  [pdf, other

    cs.DS

    On Approximating String Selection Problems with Outliers

    Authors: Christina Boucher, Gad M. Landau, Avivit Levy, David Pritchard, Oren Weimann

    Abstract: Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of same-length strings, and a parameter d, find a string x that maximizes the number of "non-outliers" within Hamming distance d of x. We pro… ▽ More

    Submitted 13 February, 2012; originally announced February 2012.

  19. arXiv:1111.0376  [pdf, other

    cs.DS

    Outlier Detection for DNA Fragment Assembly

    Authors: Christina Boucher, Christine Lo, Daniel Lokshtanov

    Abstract: Given $n$ length-$\ell$ strings $S =\{s_1, ..., s_n\}$ over a constant size alphabet $Σ$ together with parameters $d$ and $k$, the objective in the {\em Consensus String with Outliers} problem is to find a subset $S^*$ of $S$ of size $n-k$ and a string $s$ such that $\sum_{s_i \in S^*} d(s_i, s) \leq d$. Here $d(x, y)$ denotes the Hamming distance between the two strings $x$ and $y$. We prove 1.… ▽ More

    Submitted 7 November, 2011; v1 submitted 1 November, 2011; originally announced November 2011.

    Comments: 29 pages, 1 figure