Skip to main content

Showing 1–19 of 19 results for author: Arimura, H

.
  1. arXiv:2405.00131  [pdf, other

    cs.DS cs.CC cs.FL

    Finding Diverse Strings and Longest Common Subsequences in a Graph

    Authors: Yuto Shida, Giulia Punzi, Yasuaki Kobayashi, Takeaki Uno, Hiroki Arimura

    Abstract: In this paper, we study for the first time the Diverse Longest Common Subsequences (LCSs) problem under Hamming distance. Given a set of a constant number of input strings, the problem asks to decide if there exists some subset $\mathcal X$ of $K$ longest common subsequences whose diversity is no less than a specified threshold $Δ$, where we consider two types of diversities of a set $\mathcal X$… ▽ More

    Submitted 10 June, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

    Comments: Proceedings of 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024), Leibniz International Proceedings in Informatics, Vol.296, pp.21:0-21:17, June 2024

  2. arXiv:2402.18090  [pdf, other

    cs.DS cs.FL

    Computing Minimal Absent Words and Extended Bispecial Factors with CDAWG Space

    Authors: Shunsuke Inenaga, Takuya Mieno, Hiroki Arimura, Mitsuru Funakoshi, Yuta Fujishige

    Abstract: A string $w$ is said to be a minimal absent word (MAW) for a string $S$ if $w$ does not occur in $S$ and any proper substring of $w$ occurs in $S$. We focus on non-trivial MAWs which are of length at least 2. Finding such non-trivial MAWs for a given string is motivated for applications in bioinformatics and data compression. Fujishige et al. [TCS 2023] proposed a data structure of size $Θ(n)$ tha… ▽ More

    Submitted 19 May, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted for IWOCA 2024

  3. arXiv:2308.02269  [pdf, other

    cs.DS cs.FL cs.IR

    Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph

    Authors: Hiroki Arimura, Shunsuke Inenaga, Yasuaki Kobayashi, Yuto Nakashima, Mizuki Sue

    Abstract: In this paper, we present the first study of the computational complexity of converting an automata-based text index structure, called the Compact Directed Acyclic Word Graph (CDAWG), of size $e$ for a text $T$ of length $n$ into other text indexing structures for the same text, suitable for highly repetitive texts: the run-length BWT of size $r$, the irreducible PLCP array of size $r$, and the qu… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: The short version of this paper will appear in SPIRE 2023, Pisa, Italy, September 26-28, 2023, Lecture Notes in Computer Science, Springer

  4. arXiv:2305.07259  [pdf, other

    cs.DS

    Minimum Consistent Subset for Trees Revisited

    Authors: Hiroki Arimura, Tatsuya Gima, Yasuaki Kobayashi, Hiroomi Nochide, Yota Otachi

    Abstract: In a vertex-colored graph $G = (V, E)$, a subset $S \subseteq V$ is said to be consistent if every vertex has a nearest neighbor in $S$ with the same color. The problem of computing a minimum cardinality consistent subset of a graph is known to be NP-hard. On the positive side, Dey et al. (FCT 2021) show that this problem is solvable in polynomial time when input graphs are restricted to bi-colore… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: 9 pages, 3 figures

  5. arXiv:2204.11285  [pdf, other

    cs.LG

    Computing the Collection of Good Models for Rule Lists

    Authors: Kota Mata, Kentaro Kanamori, Hiroki Arimura

    Abstract: Since the seminal paper by Breiman in 2001, who pointed out a potential harm of prediction multiplicities from the view of explainable AI, global analysis of a collection of all good models, also known as a `Rashomon set,' has been attracted much attention for the last years. Since finding such a set of good models is a hard computational problem, there have been only a few algorithms for the prob… ▽ More

    Submitted 24 April, 2022; originally announced April 2022.

    Comments: 16 pages, 4 Figures, to applear in the 18th International Conference on Machine Learning and Data Mining (MLDM 2022 ), July 16 - 21, 2022, New York, USA

  6. arXiv:2202.04349  [pdf, other

    cs.DS

    Cartesian Tree Subsequence Matching

    Authors: Tsubasa Oizumi, Takeshi Kai, Takuya Mieno, Shunsuke Inenaga, Hiroki Arimura

    Abstract: Park et al. [TCS 2020] observed that the similarity between two (numerical) strings can be captured by the Cartesian trees: The Cartesian tree of a string is a binary tree recursively constructed by picking up the smallest value of the string as the root of the tree. Two strings of equal length are said to Cartesian-tree match if their Cartesian trees are isomorphic. Park et al. [TCS 2020] introdu… ▽ More

    Submitted 14 April, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

  7. Ordered Counterfactual Explanation by Mixed-Integer Linear Optimization

    Authors: Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, Yuichi Ike, Kento Uemura, Hiroki Arimura

    Abstract: Post-hoc explanation methods for machine learning models have been widely used to support decision-making. One of the popular methods is Counterfactual Explanation (CE), also known as Actionable Recourse, which provides a user with a perturbation vector of features that alters the prediction result. Given a perturbation vector, a user can interpret it as an "action" for obtaining one's desired dec… ▽ More

    Submitted 14 March, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: 20 pages, 5 figures, to appear in the 35th AAAI Conference on Artificial Intelligence (AAAI 2021)

  8. arXiv:2004.08015  [pdf, other

    cs.DB

    Efficient Constrained Pattern Mining Using Dynamic Item Ordering for Explainable Classification

    Authors: Hiroaki Iwashita, Takuya Takagi, Hirofumi Suzuki, Keisuke Goto, Kotaro Ohori, Hiroki Arimura

    Abstract: Learning of interpretable classification models has been attracting much attention for the last few years. Discovery of succinct and contrasting patterns that can highlight the differences between the two classes is very important. Such patterns are useful for human experts, and can be used to construct powerful classifiers. In this paper, we consider mining of minimal emerging patterns from high-… ▽ More

    Submitted 16 April, 2020; originally announced April 2020.

  9. Constant Amortized Time Enumeration of Independent Sets for Graphs with Bounded Clique Number

    Authors: Kazuhiro Kurita, Kunihiro Wasa, Hiroki Arimura, Takeaki Uno

    Abstract: In this study, we address the independent set enumeration problem. Although several efficient enumeration algorithms and careful analyses have been proposed for maximal independent sets, no fine-grained analysis has been given for the non-maximal variant. From the main result, we propose an algorithm $\texttt{EIS}$ for the non-maximal variant that runs in $O(q)$ amortized time and linear space, wh… ▽ More

    Submitted 9 July, 2019; v1 submitted 23 June, 2019; originally announced June 2019.

  10. arXiv:1906.01876  [pdf, other

    cs.LG stat.ML

    Enumeration of Distinct Support Vectors for Interactive Decision Making

    Authors: Kentaro Kanamori, Satoshi Hara, Masakazu Ishihata, Hiroki Arimura

    Abstract: In conventional prediction tasks, a machine learning algorithm outputs a single best model that globally optimizes its objective function, which typically is accuracy. Therefore, users cannot access the other models explicitly. In contrast to this, multiple model enumeration attracts increasing interests in non-standard machine learning applications where other criteria, e.g., interpretability or… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: presented at 2019 ICML Workshop on Human in the Loop Learning (HILL 2019), Long Beach, USA

  11. An Efficient Algorithm for Enumerating Chordal Bipartite Induced Subgraphs in Sparse Graphs

    Authors: Kazuhiro Kurita, Kunihiro Wasa, Hiroki Arimura, Takeaki Uno

    Abstract: In this paper, we propose a characterization of chordal bipartite graphs and an efficient enumeration algorithm for chordal bipartite induced subgraphs. A chordal bipartite graph is a bipartite graph without induced cycles with length six or more. It is known that the incident graph of a hypergraph is chordal bipartite graph if and only if the hypergraph is $β$-acyclic. As the main result of our p… ▽ More

    Submitted 5 March, 2019; originally announced March 2019.

  12. Efficient Enumeration of Subgraphs and Induced Subgraphs with Bounded Girth

    Authors: Kazuhiro Kurita, Kunihiro Wasa, Alessio Conte, Hiroki Arimura, Takeaki Uno

    Abstract: The girth of a graph is the length of its shortest cycle. Due to its relevance in graph theory, network analysis and practical fields such as distributed computing, girth-related problems have been object of attention in both past and recent literature. In this paper, we consider the problem of listing connected subgraphs with bounded girth. As a large girth is index of sparsity, this allows to ex… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

  13. Efficient Enumeration of Dominating Sets for Sparse Graphs

    Authors: Kazuhiro Kurita, Kunihiro Wasa, Hiroki Arimura, Takeaki Uno

    Abstract: A dominating set $D$ of a graph $G$ is a set of vertices such that any vertex in $G$ is in $D$ or its neighbor is in $D$. Enumeration of minimal dominating sets in a graph is one of central problems in enumeration study since enumeration of minimal dominating sets corresponds to enumeration of minimal hypergraph transversal. However, enumeration of dominating sets including non-minimal ones has no… ▽ More

    Submitted 28 September, 2018; v1 submitted 21 February, 2018; originally announced February 2018.

  14. arXiv:1709.08770  [pdf, ps, other

    stat.ML

    On the Model Shrinkage Effect of Gamma Process Edge Partition Models

    Authors: Iku Ohama, Issei Sato, Takuya Kida, Hiroki Arimura

    Abstract: The edge partition model (EPM) is a fundamental Bayesian nonparametric model for extracting an overlap** structure from binary matrix. The EPM adopts a gamma process ($Γ$P) prior to automatically shrink the number of active atoms. However, we empirically found that the model shrinkage of the EPM does not typically work appropriately and leads to an overfitted solution. An analysis of the expecta… ▽ More

    Submitted 25 September, 2017; originally announced September 2017.

    Comments: To appear in the 31st Annual Conference on Neural Information Processing Systems (NIPS2017)

  15. Efficient Enumeration of Induced Matchings in a Graph without Cycles with Length Four

    Authors: Kazuhiro Kurita, Kunihiro Wasa, Takeaki Uno, Hiroki Arimura

    Abstract: We address the induced matching enumeration problem. An edge set $M$ is an induced matching of a graph $G =(V,E)$. The enumeration of matchings are widely studied in literature, but the induced matching has not been paid much attention. A straightforward algorithm takes $O(|V|)$ time for each solution, that is coming from the time to generate a subproblem. We investigated local structures that ena… ▽ More

    Submitted 10 July, 2017; originally announced July 2017.

  16. arXiv:1705.09779  [pdf, ps, other

    cs.DS

    Linear-size CDAWG: new repetition-aware indexing and grammar compression

    Authors: Takuya Takagi, Keisuke Goto, Yuta Fujishige, Shunsuke Inenaga, Hiroki Arimura

    Abstract: In this paper, we propose a novel approach to combine \emph{compact directed acyclic word graphs} (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be represented with $O(\tilde e_T \log n)$ bits of space allowing for $O(\log n)$-time random and $O(1)$-time sequential accesses to edge labels, and $O(m \log σ+ occ)$-tim… ▽ More

    Submitted 27 July, 2017; v1 submitted 27 May, 2017; originally announced May 2017.

    Comments: 12 pages, 2 figures

  17. Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

    Authors: Takuya Takagi, Shunsuke Inenaga, Kunihiko Sadakane, Hiroki Arimura

    Abstract: In this paper, we present a new data structure called the packed compact trie (packed c-trie) which stores a set $S$ of $k$ strings of total length $n$ in $n \logσ+ O(k \log n)$ bits of space and supports fast pattern matching queries and updates, where $σ$ is the size of an alphabet. Assume that $α= \log_σn$ letters are packed in a single machine word on the standard word RAM model, and let… ▽ More

    Submitted 1 February, 2016; originally announced February 2016.

    Comments: 10 pages, 2 figures

  18. arXiv:1507.07622  [pdf, other

    cs.DS

    Fully-Online Suffix Tree and Directed Acyclic Word Graph Construction for Multiple Texts

    Authors: Takuya Takagi, Shunsuke Inenaga, Hiroki Arimura, Dany Breslauer, Diptarama Hendrian

    Abstract: We consider construction of the suffix tree and the directed acyclic word graph (DAWG) indexing data structures for a collection $\mathcal{T}$ of texts, where a new symbol may be appended to any text in $\mathcal{T} = \{T_1, \ldots, T_K\}$, at any time. This fully-online scenario, which arises in dynamically indexing multi-sensor data, is a natural generalization of the long solved semi-online tex… ▽ More

    Submitted 12 July, 2018; v1 submitted 27 July, 2015; originally announced July 2015.

    Comments: 28 pages, 6 figures, LaTeX

  19. Efficient Enumeration of Induced Subtrees in a K-Degenerate Graph

    Authors: Kunihiro Wasa, Hiroki Arimura, Takeaki Uno

    Abstract: In this paper, we address the problem of enumerating all induced subtrees in an input k-degenerate graph, where an induced subtree is an acyclic and connected induced subgraph. A graph G = (V, E) is a k-degenerate graph if for any its induced subgraph has a vertex whose degree is less than or equal to k, and many real-world graphs have small degeneracies, or very close to small degeneracies. Altho… ▽ More

    Submitted 23 July, 2014; originally announced July 2014.