Skip to main content

Showing 1–24 of 24 results for author: Kari, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02538  [pdf, other

    q-bio.GN cs.LG

    CGRclust: Chaos Game Representation for Twin Contrastive Clustering of Unlabelled DNA Sequences

    Authors: Fatemeh Alipour, Kathleen A. Hill, Lila Kari

    Abstract: This study proposes CGRclust, a novel combination of unsupervised twin contrastive clustering of Chaos Game Representations (CGR) of DNA sequences, with convolutional neural networks (CNNs). To the best of our knowledge, CGRclust is the first method to use unsupervised learning for image classification (herein applied to two-dimensional CGR images) for clustering datasets of DNA sequences. CGRclus… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 29 pages, 4 figures

    ACM Class: F.2.2, I.2.7

  2. arXiv:2406.12723  [pdf, other

    cs.LG

    BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

    Authors: Zahra Gharaee, Scott C. Lowe, ZeMing Gong, Pablo Millan Arias, Nicholas Pellegrino, Austin T. Wang, Joakim Bruslund Haurum, Iuliia Zarubiieva, Lila Kari, Dirk Steinke, Graham W. Taylor, Paul Fieguth, Angel X. Chang

    Abstract: As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, this paper presents the BIOSCAN-5M Insect dataset to the machine learning community and establish several benchmark tasks. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by includin… ▽ More

    Submitted 24 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2311.02401  [pdf, other

    cs.LG

    BarcodeBERT: Transformers for Biodiversity Analysis

    Authors: Pablo Millan Arias, Niousha Sadjadi, Monireh Safari, ZeMing Gong, Austin T. Wang, Scott C. Lowe, Joakim Bruslund Haurum, Iuliia Zarubiieva, Dirk Steinke, Lila Kari, Angel X. Chang, Graham W. Taylor

    Abstract: Understanding biodiversity is a global challenge, in which DNA barcodes - short snippets of DNA that cluster by species - play a pivotal role. In particular, invertebrates, a highly diverse and under-explored group, pose unique taxonomic complexities. We explore machine learning approaches, comparing supervised CNNs, fine-tuned foundation models, and a DNA barcode-specific masking strategy across… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: Main text: 5 pages, Total: 9 pages, 2 figures, accepted at the 4th Workshop on Self-Supervised Learning: Theory and Practice (NeurIPS 2023)

  4. arXiv:1909.02512  [pdf, ps, other

    cs.FL

    Descriptional Complexity of Semi-Simple Splicing Systems

    Authors: Lila Kari, Timothy Ng

    Abstract: Splicing systems are generative mechanisms introduced by Tom Head in 1987 to model the biological process of DNA recombination. The computational engine of a splicing system is the "splicing operation", a cut-and-paste binary string operation defined by a set of "splicing rules" $r = (α_1, α_2 ; α_3, α_4)$ where $α_1, α_2, α_3, α_4$ are words over an alphabet $Σ$. For two strings… ▽ More

    Submitted 5 September, 2019; originally announced September 2019.

  5. arXiv:1710.06000  [pdf, ps, other

    cs.FL

    State Complexity of Overlap Assembly

    Authors: Janusz Brzozowski, Lila Kari, Bai Li, Marek Szykuła

    Abstract: The \emph{state complexity} of a regular language $L_m$ is the number $m$ of states in a minimal deterministic finite automaton (DFA) accepting $L_m$. The state complexity of a regularity-preserving binary operation on regular languages is defined as the maximal state complexity of the result of the operation where the two operands range over all languages of state complexities $\le m$ and… ▽ More

    Submitted 11 December, 2018; v1 submitted 16 October, 2017; originally announced October 2017.

  6. arXiv:1503.00035  [pdf, ps, other

    cs.FL cs.CC

    Transducer Descriptions of DNA Code Properties and Undecidability of Antimorphic Problems

    Authors: Lila Kari, Stavros Konstantinidis, Steffen Kopecki

    Abstract: This work concerns formal descriptions of DNA code properties, and builds on previous work on transducer descriptions of classic code properties and on trajectory descriptions of DNA code properties. This line of research allows us to give a property as input to an algorithm, in addition to any regular language, which can then answer questions about the language and the property. Here we define DN… ▽ More

    Submitted 27 February, 2015; originally announced March 2015.

  7. arXiv:1406.1041  [pdf, ps, other

    cs.FL

    An efficient algorithm for computing the edit distance of a regular language via input-altering transducers

    Authors: Lila Kari, Stavros Konstantinidis, Steffen Kopecki, Meng Yang

    Abstract: We revisit the problem of computing the edit distance of a regular language given via an NFA. This problem relates to the inherent maximal error-detecting capability of the language in question. We present an efficient algorithm for solving this problem which executes in time $O(r^2n^2d)$, where $r$ is the cardinality of the alphabet involved, $n$ is the number of transitions in the given NFA, and… ▽ More

    Submitted 4 June, 2014; originally announced June 2014.

    MSC Class: 68Q45

  8. arXiv:1404.0967  [pdf, ps, other

    cs.CC

    Binary pattern tile set synthesis is NP-hard

    Authors: Lila Kari, Steffen Kopecki, Pierre-Étienne Meunier, Matthew J. Patitz, Shinnosuke Seki

    Abstract: In the field of algorithmic self-assembly, a long-standing unproven conjecture has been that of the NP-hardness of binary pattern tile set synthesis (2-PATS). The $k$-PATS problem is that of designing a tile assembly system with the smallest number of tile types which will self-assemble an input pattern of $k$ colors. Of both theoretical and practical significance, $k$-PATS has been studied in a s… ▽ More

    Submitted 3 April, 2014; originally announced April 2014.

  9. arXiv:1307.3755  [pdf, ps, other

    q-bio.GN cs.CV q-bio.PE q-bio.QM

    Map of Life: Measuring and Visualizing Species' Relatedness with "Molecular Distance Maps"

    Authors: Lila Kari, Kathleen A. Hill, Abu Sadat Sayem, Nathaniel Bryans, Katelyn Davis, Nikesh S. Dattani

    Abstract: We propose a novel combination of methods that (i) portrays quantitative characteristics of a DNA sequence as an image, (ii) computes distances between these images, and (iii) uses these distances to output a map wherein each sequence is a point in a common Euclidean space. In the resulting "Molecular Distance Map" each point signifies a DNA sequence, and the geometric distance between any two poi… ▽ More

    Submitted 14 July, 2013; originally announced July 2013.

    Comments: 13 pages, 8 figures. Funded by: NSERC/CRSNG (Natural Science & Engineering Research Council of Canada / Conseil de recherches en sciences naturelles et en génie du Canada), and the Oxford University Press. Acknowledgements: Ronghai Tu, Tao Tao, Steffen Kopecki, Andre Lachance, Jeremy McNeil, Greg Thorn, Oxford University Mathematical Institute

    MSC Class: 92; 68 ACM Class: J.3; J.2; I.4; I.5; H.3.3

  10. arXiv:1306.3257  [pdf, ps, other

    cs.CC

    3-color Bounded Patterned Self-assembly

    Authors: Lila Kari, Steffen Kopecki, Shinnosuke Seki

    Abstract: Patterned self-assembly tile set synthesis PATS is the problem of finding a minimal tile set which uniquely self-assembles into a given pattern. Czeizler and Popa proved the NP-completeness of PATS and Seki showed that the PATS problem is already NP-complete for patterns with 60 colors. In search for the minimal number of colors such that PATS remains NP-complete, we introduce multiple bound PATS… ▽ More

    Submitted 13 June, 2013; originally announced June 2013.

  11. arXiv:1302.2840  [pdf, ps, other

    cs.DM cs.FL math.CO

    Hypergraph Automata: A Theoretical Model for Patterned Self-assembly

    Authors: Lila Kari, Steffen Kopecki, Amirhossein Simjour

    Abstract: Patterned self-assembly is a process whereby coloured tiles self-assemble to build a rectangular coloured pattern. We propose self-assembly (SA) hypergraph automata as an automata-theoretic model for patterned self-assembly. We investigate the computational power of SA-hypergraph automata and show that for every recognizable picture language, there exists an SA-hypergraph automaton that accepts th… ▽ More

    Submitted 12 February, 2013; originally announced February 2013.

    Comments: 25 pages

  12. arXiv:1112.4897  [pdf, ps, other

    cs.FL

    Deciding Whether a Regular Language is Generated by a Splicing System

    Authors: Lila Kari, Steffen Kopecki

    Abstract: Splicing as a binary word/language operation is inspired by the DNA recombination under the action of restriction enzymes and ligases, and was first introduced by Tom Head in 1987. Shortly thereafter, it was proven that the languages generated by (finite) splicing systems form a proper subclass of the class of regular languages. However, the question of whether or not one can decide if a given reg… ▽ More

    Submitted 30 August, 2012; v1 submitted 20 December, 2011; originally announced December 2011.

  13. arXiv:1110.0760  [pdf, ps, other

    cs.FL

    Iterated Hairpin Completions of Non-crossing Words

    Authors: Lila Kari, Steffen Kopecki, Shinnosuke Seki

    Abstract: Iterated hairpin completion is an operation on formal languages that is inspired by the hairpin formation in DNA biochemistry. Iterated hairpin completion of a word (or more precisely a singleton language) is always a context-sensitive language and for some words it is known to be non-context-free. However, it is unknown whether regularity of iterated hairpin completion of a given word is decidabl… ▽ More

    Submitted 4 October, 2011; originally announced October 2011.

  14. arXiv:1104.2385  [pdf, ps, other

    cs.FL

    On the regularity of iterated hairpin completion of a single word

    Authors: Lila Kari, Steffen Kopecki, Shinnosuke Seki

    Abstract: Hairpin completion is an abstract operation modeling a DNA bio-operation which receives as input a DNA strand $w = xαy \calpha$, and outputs $w' = x αy \barα \bar{x}$, where $\bar{x}$ denotes the Watson-Crick complement of $x$. In this paper, we focus on the problem of finding conditions under which the iterated hairpin completion of a given word is regular. According to the numbers of words $α$ a… ▽ More

    Submitted 13 April, 2011; originally announced April 2011.

    Comments: 17 pages, 1 figure, submitted to Fundamenta Informaticae

  15. Ciliate Gene Unscrambling with Fewer Templates

    Authors: Lila Kari, Afroza Rahman

    Abstract: One of the theoretical models proposed for the mechanism of gene unscrambling in some species of ciliates is the template-guided recombination (TGR) system by Prescott, Ehrenfeucht and Rozenberg which has been generalized by Daley and McQuillan from a formal language theory perspective. In this paper, we propose a refinement of this model that generates regular languages using the iterated TGR sys… ▽ More

    Submitted 10 August, 2010; originally announced August 2010.

    Comments: In Proceedings DCFS 2010, arXiv:1008.1270

    Journal ref: EPTCS 31, 2010, pp. 120-129

  16. State Complexity of Catenation Combined with Star and Reversal

    Authors: Bo Cui, Yuan Gao, Lila Kari, Sheng Yu

    Abstract: This paper is a continuation of our research work on state complexity of combined operations. Motivated by applications, we study the state complexities of two particular combined operations: catenation combined with star and catenation combined with reversal. We show that the state complexities of both of these combined operations are considerably less than the compositions of the state complexit… ▽ More

    Submitted 10 August, 2010; originally announced August 2010.

    Comments: In Proceedings DCFS 2010, arXiv:1008.1270

    Journal ref: EPTCS 31, 2010, pp. 58-67

  17. arXiv:1006.4646  [pdf, ps, other

    cs.FL

    State Complexity of Two Combined Operations: Reversal-Catenation and Star-Catenation

    Authors: Bo Cui, Yuan Gao, Lila Kari, Sheng Yu

    Abstract: In this paper, we show that, due to the structural properties of the resulting automaton obtained from a prior operation, the state complexity of a combined operation may not be equal but close to the mathematical composition of the state complexities of its component operations. In particular, we provide two witness combined operations: reversal combined with catenation and star combined with cat… ▽ More

    Submitted 23 June, 2010; originally announced June 2010.

    Comments: 20 pages, 7 figures

  18. arXiv:1006.2897  [pdf, ps, other

    cs.CC cs.DS

    The Power of Nondeterminism in Self-Assembly

    Authors: Nathaniel Bryans, Ehsan Chiniforooshan, David Doty, Lila Kari, Shinnosuke Seki

    Abstract: We investigate the role of nondeterminism in Winfree's abstract Tile Assembly Model (aTAM), which was conceived to model artificial molecular self-assembling systems constructed from DNA. Of particular practical importance is to find tile systems that minimize resources such as the number of distinct tile types, each of which corresponds to a set of DNA strands that must be custom-synthesized in a… ▽ More

    Submitted 25 November, 2010; v1 submitted 15 June, 2010; originally announced June 2010.

    Comments: Accepted to SODA 2011. The previous version of this paper (which appears in the SODA proceedings) had open questions about computing the minimum number of tile types to weakly self-assemble a set. The answer to these questions is "no", by a very simple imitation of the proof that Kolmogorov complexity is uncomputable based on the Berry paradox. These open questions have been removed

  19. Scalable, Time-Responsive, Digital, Energy-Efficient Molecular Circuits using DNA Strand Displacement

    Authors: Ehsan Chiniforooshan, David Doty, Lila Kari, Shinnosuke Seki

    Abstract: We propose a novel theoretical biomolecular design to implement any Boolean circuit using the mechanism of DNA strand displacement. The design is scalable: all species of DNA strands can in principle be mixed and prepared in a single test tube, rather than requiring separate purification of each species, which is a barrier to large-scale synthesis. The design is time-responsive: the concentratio… ▽ More

    Submitted 18 March, 2010; v1 submitted 16 March, 2010; originally announced March 2010.

    Comments: version 2: the paper itself is unchanged from version 1, but the arXiv software stripped some asterisk characters out of the abstract whose purpose was to highlight words. These characters have been replaced with underscores in version 2. The arXiv software also removed the second paragraph of the abstract, which has been (attempted to be) re-inserted. Also, although the secondary subject is "Soft Condensed Matter", this classification was chosen by the arXiv moderators after submission, not chosen by the authors. The authors consider this submission to be a theoretical computer science paper.

    ACM Class: F.1.1

  20. arXiv:1002.4996  [pdf, ps, other

    cs.DM

    Triangular Self-Assembly

    Authors: Lila Kari, Shinnosuke Seki, Zhi Xu

    Abstract: We discuss the self-assembly system of triangular tiles instead of square tiles, in particular right triangular tiles and equilateral triangular tiles. We show that the triangular tile assembly system, either deterministic or non-deterministic, has the same power to the square tile assembly system in computation, which is Turing universal. By providing counter-examples, we show that the triangul… ▽ More

    Submitted 26 February, 2010; originally announced February 2010.

    ACM Class: J.3

  21. arXiv:1002.4084  [pdf, ps, other

    cs.CC

    Properties of Pseudo-Primitive Words and their Applications

    Authors: Lila Kari, Benoît Masson, Shinnosuke Seki

    Abstract: A pseudo-primitive word with respect to an antimorphic involution θis a word which cannot be written as a catenation of occurrences of a strictly shorter word t and θ(t). Properties of pseudo-primitive words are investigated in this paper. These properties link pseudo-primitive words with essential notions in combinatorics on words such as primitive words, (pseudo)-palindromes, and (pseudo)-comm… ▽ More

    Submitted 22 February, 2010; originally announced February 2010.

    Comments: Submitted to International Journal of Foundations of Computer Science

  22. arXiv:1002.3769  [pdf, ps, other

    cs.CC

    Polyominoes Simulating Arbitrary-Neighborhood Zippers and Tilings

    Authors: Lila Kari, Benoît Masson

    Abstract: This paper provides a bridge between the classical tiling theory and the complex neighborhood self-assembling situations that exist in practice. The neighborhood of a position in the plane is the set of coordinates which are considered adjacent to it. This includes classical neighborhoods of size four, as well as arbitrarily complex neighborhoods. A generalized tile system consists of a set of til… ▽ More

    Submitted 11 April, 2011; v1 submitted 19 February, 2010; originally announced February 2010.

    Comments: Submitted to Theoretical Computer Science

  23. Negative Interactions in Irreversible Self-Assembly

    Authors: David Doty, Lila Kari, Benoit Masson

    Abstract: This paper explores the use of negative (i.e., repulsive) interaction the abstract Tile Assembly Model defined by Winfree. Winfree postulated negative interactions to be physically plausible in his Ph.D. thesis, and Reif, Sahu, and Yin explored their power in the context of reversible attachment operations. We explore the power of negative interactions with irreversible attachments, and we achie… ▽ More

    Submitted 13 February, 2010; originally announced February 2010.

    ACM Class: F.1.1; F.1.1; F.1.m; F.m; J.2

  24. arXiv:0911.2233  [pdf, ps, other

    cs.FL cs.DS

    Pseudo-Power Avoidance

    Authors: Ehsan Chiniforooshan, Lila Kari, Zhi Xu

    Abstract: Repetition avoidance has been studied since Thue's work. In this paper, we considered another type of repetition, which is called pseudo-power. This concept is inspired by Watson-Crick complementarity in DNA sequence and is defined over an antimorphic involution $φ$. We first classify the alphabet $Σ$ and the antimorphic involution $φ$, under which there exists sufficiently long pseudo-$k$th-pow… ▽ More

    Submitted 11 November, 2009; originally announced November 2009.

    ACM Class: F.4.3; J.3