Skip to main content

Showing 1–24 of 24 results for author: Boucher, C

.
  1. arXiv:2402.04258  [pdf, other

    eess.IV cs.DB q-bio.QM

    MAPLES-DR: MESSIDOR Anatomical and Pathological Labels for Explainable Screening of Diabetic Retinopathy

    Authors: Gabriel Lepetit-Aimon, Clément Playout, Marie Carole Boucher, Renaud Duval, Michael H Brent, Farida Cheriet

    Abstract: Reliable automatic diagnosis of Diabetic Retinopathy (DR) and Macular Edema (ME) is an invaluable asset in improving the rate of monitored patients among at-risk populations and in enabling earlier treatments before the pathology progresses and threatens vision. However, the explainability of screening models is still an open question, and specifically designed datasets are required to support the… ▽ More

    Submitted 19 January, 2024; originally announced February 2024.

    Comments: 1 pages, 7 figures

  2. arXiv:2309.01055  [pdf, other

    cs.RO cs.AI

    Integration of Vision-based Object Detection and Gras** for Articulated Manipulator in Lunar Conditions

    Authors: Camille Boucher, Gustavo H. Diaz, Shreya Santra, Kentaro Uno, Kazuya Yoshida

    Abstract: The integration of vision-based frameworks to achieve lunar robot applications faces numerous challenges such as terrain configuration or extreme lighting conditions. This paper presents a generic task pipeline using object detection, instance segmentation and grasp detection, that can be used for various applications by using the results of these vision-based systems in a different way. We achiev… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

  3. arXiv:2308.07809  [pdf, other

    cs.DS

    Another virtue of wavelet forests?

    Authors: Christina Boucher, Travis Gagie, Aaron Hong, Yansong Li, Norbert Zeh

    Abstract: A wavelet forest for a text $T [1..n]$ over an alphabet $σ$ takes $n H_0 (T) + o (n \log σ)$ bits of space and supports access and rank on $T$ in $O (\log σ)$ time. Kärkkäinen and Puglisi (2011) implicitly introduced wavelet forests and showed that when $T$ is the Burrows-Wheeler Transform (BWT) of a string $S$, then a wavelet forest for $T$ occupies space bounded in terms of higher-order empirica… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  4. arXiv:2307.15845  [pdf, other

    cond-mat.mtrl-sci

    Application of murexide as a cap** agent for fabrication of magnetite anodes for supercapacitors: experimental and first-principle studies

    Authors: Coulton Boucher, Igor Zhitomirsky, Oleg Rubel

    Abstract: In this study, we investigate the effectiveness of murexide for surface modification of Fe$_3$O$_4$ nanoparticles to enhance the performance of multi-walled carbon nanotube-Fe$_3$O$_4$ supercapacitor anodes. Our experimental results demonstrate significant improvements in electrode performance when murexide is used as a cap** or dispersing agent compared to the case with no additives. When murex… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: 30 pages, 11 figures, 1 table

  5. arXiv:2305.05893  [pdf, other

    cs.DS

    Acceleration of FM-index Queries Through Prefix-free Parsing

    Authors: Aaron Hong, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher, Travis Gagie

    Abstract: FM-indexes are a crucial data structure in DNA alignment, for example, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  6. arXiv:2207.07458  [pdf

    stat.ML cs.LG

    Joint Application of the Target Trial Causal Framework and Machine Learning Modeling to Optimize Antibiotic Therapy: Use Case on Acute Bacterial Skin and Skin Structure Infections due to Methicillin-resistant Staphylococcus aureus

    Authors: Inyoung Jun, Simone Marini, Christina A. Boucher, J. Glenn Morris, Jiang Bian, Mattia Prosperi

    Abstract: Bacterial infections are responsible for high mortality worldwide. Antimicrobial resistance underlying the infection, and multifaceted patient's clinical status can hamper the correct choice of antibiotic treatment. Randomized clinical trials provide average treatment effect estimates but are not ideal for risk stratification and optimization of therapeutic choice, i.e., individualized treatment e… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: This is the Proceedings of the KDD workshop on Applied Data Science for Healthcare (DSHealth 2022), which was held on Washington D.C, August 14 2022

  7. arXiv:2107.03383  [pdf, other

    q-bio.GN cs.LG

    Assessing putative bias in prediction of anti-microbial resistance from real-world genoty** data under explicit causal assumptions

    Authors: Mattia Prosperi, Simone Marini, Christina Boucher, Jiang Bian

    Abstract: Whole genome sequencing (WGS) is quickly becoming the customary means for identification of antimicrobial resistance (AMR) due to its ability to obtain high resolution information about the genes and mechanisms that are causing resistance and driving pathogen mobility. By contrast, traditional phenotypic (antibiogram) testing cannot easily elucidate such information. Yet development of AMR predict… ▽ More

    Submitted 23 July, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

    Comments: In DSHealth '21] Joint KDD 2021 Health Day and 2021 KDD Workshop on Applied Data Science for Healthcare, Aug 14--18, 2021, Virtual, 5 pages

  8. arXiv:2106.11191  [pdf, other

    cs.DS

    Computing the original eBWT faster, simpler, and with less memory

    Authors: Christina Boucher, Davide Cenzato, Zsuzsanna Lipták, Massimiliano Rossi, Marinella Sciortino

    Abstract: Mantaci et al. [TCS 2007] defined the eBWT to extend the definition of the BWT to a collection of strings, however, since this introduction, it has been used more generally to describe any BWT of a collection of strings and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algo… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: 20 pages, 5 figures, 1 table

  9. arXiv:2011.05610  [pdf, ps, other

    cs.DS

    PHONI: Streamed Matching Statistics with Multi-Genome References

    Authors: Christina Boucher, Travis Gagie, Tomohiro I, Dominik Köppl, Ben Langmead, Giovanni Manzini, Gonzalo Navarro, Alejandro Pacheco, Massimiliano Rossi

    Abstract: Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this pape… ▽ More

    Submitted 11 February, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

    Comments: Our code is available at https://github.com/koeppl/phoni

  10. arXiv:2006.11687  [pdf, other

    cs.DS

    PFP Data Structures

    Authors: Christina Boucher, Ondřej Cvacho, Travis Gagie, Jan Holub, Giovanni Manzini, Gonzalo Navarro, Massimiliano Rossi

    Abstract: Prefix-free parsing (PFP) was introduced by Boucher et al. (2019) as a preprocessing step to ease the computation of Burrows-Wheeler Transforms (BWTs) of genomic databases. Given a string $S$, it produces a dictionary $D$ and a parse $P$ of overlap** phrases such that $\mathrm{BWT} (S)$ can be computed from $D$ and $P$ in time and workspace bounded in terms of their combined size… ▽ More

    Submitted 20 June, 2020; originally announced June 2020.

  11. arXiv:1912.00913  [pdf

    stat.AP cs.SE

    Automated metrics calculation in a dynamic heterogeneous environment

    Authors: Craig Boucher, Ulf Knoblich, Daniel Miller, Sasha Patotski, Amin Saied, Venky Venkateshaiah

    Abstract: A consistent theme in software experimentation at Microsoft has been solving problems of experimentation at scale for a diverse set of products. Running experiments at scale (i.e., many experiments on many users) has become state of the art across the industry. However, providing a single platform that allows software experimentation in a highly heterogenous and constantly evolving ecosystem remai… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

    Comments: 5 pages, MIT Code

  12. arXiv:1908.01263  [pdf, ps, other

    cs.DS q-bio.GN

    Matching reads to many genomes with the $r$-index

    Authors: Taher Mun, Alan Kuhnle, Christina Boucher, Travis Gagie, Ben Langmead, Giovanni Manzini

    Abstract: The $r$-index is a tool for compressed indexing of genomic databases for exact pattern matching, which can be used to completely align reads that perfectly match some part of a genome in the database or to find seeds for reads that do not. This paper shows how to download and install the programs ri-buildfasta and ri-align; how to call ri-buildfasta on a FASTA file to build an $r$-index for that f… ▽ More

    Submitted 3 August, 2019; originally announced August 2019.

  13. arXiv:1902.02397  [pdf

    stat.AP

    Winning Is Not Everything: A contextual analysis of hockey face-offs

    Authors: Nick Czuzoj-Shulman, David Yu, Christopher Boucher, Luke Bornn, Mehrsan Javan

    Abstract: This paper takes a different approach to evaluating face-offs in ice hockey. Instead of looking at win percentages, the de facto measure of successful face-off takers for decades, focuses on the game events following the face-off and how directionality, clean wins, and player handedness play a significant role in creating value. This will demonstrate how not all face-off wins are made equal: some… ▽ More

    Submitted 6 February, 2019; originally announced February 2019.

    Comments: Accepted paper for the 2019 Sloan Sports Analytics Conference

  14. arXiv:1902.02020  [pdf

    stat.AP

    Playing Fast Not Loose: Evaluating team-level pace of play in ice hockey using spatio-temporal possession data

    Authors: David Yu, Christopher Boucher, Luke Bornn, Mehrsan Javan

    Abstract: Pace of play is an important characteristic in hockey as well as other team sports. We provide the first comprehensive study of pace within the sport of hockey, focusing on how teams and players impact pace in different regions of the ice, and the resultant effect on other aspects of the game. First we examined how pace of play varies across the surface of the rink, across different periods, at… ▽ More

    Submitted 5 February, 2019; originally announced February 2019.

    Comments: Accepted Paper for the 2019 Sloan Sports Analytics Conference

  15. arXiv:1811.06933  [pdf, other

    cs.DS

    Efficient Construction of a Complete Index for Pan-Genomics Read Alignment

    Authors: Alan Kuhnle, Taher Mun, Christina Boucher, Travis Gagie, Ben Langmead, Giovanni Manzini

    Abstract: While short read aligners, which predominantly use the FM-index, are able to easily index one or a few human genomes, they do not scale well to indexing databases containing thousands of genomes. To understand why, it helps to examine the main components of the FM-index in more detail, which is a rank data structure over the Burrows-Wheeler Transform (BWT) of the string that will allow us to find… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

  16. arXiv:1808.07547  [pdf, other

    cond-mat.mes-hall

    Electrostatic enhancement factor for the coagulation of silicon nanoparticles in low-temperature plasmas

    Authors: Benjamin Santos, Laura Cacot, Claude Boucher, François Vidal

    Abstract: The coagulation enhancement factor due to electrostatic (Coulomb and polarization-induced) interaction between silicon nanoparticles was numerically computed for different nanoparticle sizes and charges in typical low-emperature argon-silane plasma conditions. We used a rigorous formulation, based on a multipole moment coefficients, to describe the complete electrostatic interaction between dielec… ▽ More

    Submitted 22 August, 2018; originally announced August 2018.

    Comments: submitted to PSST

  17. arXiv:1803.11245  [pdf, other

    cs.DS

    Prefix-Free Parsing for Building Big BWTs

    Authors: Christina Boucher, Travis Gagie, Alan Kuhnle, Ben Langmead, Giovanni Manzini, Taher Mun

    Abstract: High-throughput sequencing technologies have led to explosive growth of genomic databases; one of which will soon reach hundreds of terabytes. For many applications we want to build and store indexes of these databases but constructing such indexes is a challenge. Fortunately, many of these genomic databases are highly-repetitive---a characteristic that can be exploited to ease the computation of… ▽ More

    Submitted 16 November, 2018; v1 submitted 29 March, 2018; originally announced March 2018.

    Comments: Preliminary version appeared at WABI '18; full version submitted to a journal

  18. arXiv:1506.03262  [pdf, other

    cs.DS

    Relative Select

    Authors: Christina Boucher, Alexander Bowe, Travis Gagie, Giovanni Manzini, Jouni Sirén

    Abstract: Motivated by the problem of storing coloured de Bruijn graphs, we show how, if we can already support fast select queries on one string, then we can store a little extra information and support fairly fast select queries on a similar string.

    Submitted 10 June, 2015; originally announced June 2015.

  19. arXiv:1411.5890  [pdf, other

    q-bio.GN cs.CE

    Misassembly Detection using Paired-End Sequence Reads and Optical Map** Data

    Authors: Martin D. Muggli, Simon J. Puglisi, Roy Ronen, Christina Boucher

    Abstract: A crucial problem in genome assembly is the discovery and correction of misassembly errors in draft genomes. We develop a method that will enhance the quality of draft genomes by identifying and removing misassembly errors using paired short read sequence data and optical map** data. We apply our method to various assemblies of the loblolly pine and Francisella tularensis genomes. Our results de… ▽ More

    Submitted 20 November, 2014; originally announced November 2014.

    Comments: 14 pages, 4 figures. Submitted to RECOMB. Preparing to submit to Genome Biology

  20. arXiv:1411.2718  [pdf, other

    cs.DS q-bio.GN q-bio.QM

    Variable-Order de Bruijn Graphs

    Authors: Christina Boucher, Alex Bowe, Travis Gagie, Simon J. Puglisi, Kunihiko Sadakane

    Abstract: The de Bruijn graph $G_K$ of a set of strings $S$ is a key data structure in genome assembly that represents overlaps between all the $K$-length substrings of $S$. Construction and navigation of the graph is a space and time bottleneck in practice and the main hurdle for assembling large, eukaryote genomes. This problem is compounded by the fact that state-of-the-art assemblers do not build the de… ▽ More

    Submitted 17 November, 2014; v1 submitted 11 November, 2014; originally announced November 2014.

    Comments: Conference submission, 10 pages, +minor corrections

  21. arXiv:1408.5592  [pdf, other

    cs.CE

    HyDA-Vista: Towards Optimal Guided Selection of k-mer Size for Sequence Assembly

    Authors: Seyed Basir Shariat Razavi, Narjes Sadat Movahedi Tabrizi, Hamidreza Chitsaz, Christina Boucher

    Abstract: Motivation: Intimately tied to assembly quality is the complexity of the de Bruijn graph built by the assembler. Thus, there have been many paradigms developed to decrease the complexity of the de Bruijn graph. One obvious combinatorial paradigm for this is to allow the value of $k$ to vary; having a larger value of $k$ where the graph is more complex and a smaller value of $k$ where the graph wou… ▽ More

    Submitted 24 August, 2014; originally announced August 2014.

    Comments: 11 pages, 1 figure, 1 table

  22. arXiv:1206.5846  [pdf, other

    q-bio.QM q-bio.GN

    SeeSite: Efficiently Finding Co-occurring Splice Sites and Exon Splicing Enhancers

    Authors: Christine Lo, Boyko Kakaradov, Daniel Lokshtanov, Christina Boucher

    Abstract: The problem of identifying splice sites consists of two sub-problems: finding their boundaries, and characterizing their sequence markers. Other splicing elements---including, enhancers and silencers---that occur in the intronic and exonic regions play an important role in splicing activity. Existing methods for detecting splicing elements are limited to finding either splice sites or enhancers an… ▽ More

    Submitted 25 June, 2012; originally announced June 2012.

  23. arXiv:1202.2820  [pdf, other

    cs.DS

    On Approximating String Selection Problems with Outliers

    Authors: Christina Boucher, Gad M. Landau, Avivit Levy, David Pritchard, Oren Weimann

    Abstract: Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of same-length strings, and a parameter d, find a string x that maximizes the number of "non-outliers" within Hamming distance d of x. We pro… ▽ More

    Submitted 13 February, 2012; originally announced February 2012.

  24. arXiv:1111.0376  [pdf, other

    cs.DS

    Outlier Detection for DNA Fragment Assembly

    Authors: Christina Boucher, Christine Lo, Daniel Lokshtanov

    Abstract: Given $n$ length-$\ell$ strings $S =\{s_1, ..., s_n\}$ over a constant size alphabet $Σ$ together with parameters $d$ and $k$, the objective in the {\em Consensus String with Outliers} problem is to find a subset $S^*$ of $S$ of size $n-k$ and a string $s$ such that $\sum_{s_i \in S^*} d(s_i, s) \leq d$. Here $d(x, y)$ denotes the Hamming distance between the two strings $x$ and $y$. We prove 1.… ▽ More

    Submitted 7 November, 2011; v1 submitted 1 November, 2011; originally announced November 2011.

    Comments: 29 pages, 1 figure