Search | arXiv e-print repository

arXiv:2012.10092 [pdf, other]

The Parameterized Suffix Tray

Authors: Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: Let $Σ$ and $Π$ be disjoint alphabets, respectively called the static alphabet and the parameterized alphabet. Two strings $x$ and $y$ over $Σ\cup Π$ of equal length are said to parameterized match (p-match) if there exists a renaming bijection $f$ on $Σ$ and $Π$ which is identity on $Σ$ and maps the characters of $x$ to those of $y$ so that the two strings become identical. The indexing version o… ▽ More Let $Σ$ and $Π$ be disjoint alphabets, respectively called the static alphabet and the parameterized alphabet. Two strings $x$ and $y$ over $Σ\cup Π$ of equal length are said to parameterized match (p-match) if there exists a renaming bijection $f$ on $Σ$ and $Π$ which is identity on $Σ$ and maps the characters of $x$ to those of $y$ so that the two strings become identical. The indexing version of the problem of finding p-matching occurrences of a given pattern in the text is a well-studied topic in string matching. In this paper, we present a state-of-the-art indexing structure for p-matching called the parameterized suffix tray of an input text $T$, denoted by $\mathsf{PSTray}(T)$. We show that $\mathsf{PSTray}(T)$ occupies $O(n)$ space and supports pattern matching queries in $O(m + \log (σ+π) + \mathit{occ})$ time, where $n$ is the length of $T$, $m$ is the length of a query pattern $P$, $π$ is the number of distinct symbols of $|Π|$ in $T$, $σ$ is the number of distinct symbols of $|Σ|$ in $T$ and $\mathit{occ}$ is the number of p-matching occurrences of $P$ in $T$. We also present how to build $\mathsf{PSTray}(T)$ in $O(n)$ time from the parameterized suffix tree of $T$. △ Less

Submitted 3 February, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

Comments: Accepted for CIAC 2021

arXiv:2011.12527 [pdf, other]

Match Them Up: Visually Explainable Few-shot Image Classification

Authors: Bowen Wang, Liangzhi Li, Manisha Verma, Yuta Nakashima, Ryo Kawasaki, Hajime Nagahara

Abstract: Few-shot learning (FSL) approaches are usually based on an assumption that the pre-trained knowledge can be obtained from base (seen) categories and can be well transferred to novel (unseen) categories. However, there is no guarantee, especially for the latter part. This issue leads to the unknown nature of the inference process in most FSL methods, which hampers its application in some risk-sensi… ▽ More Few-shot learning (FSL) approaches are usually based on an assumption that the pre-trained knowledge can be obtained from base (seen) categories and can be well transferred to novel (unseen) categories. However, there is no guarantee, especially for the latter part. This issue leads to the unknown nature of the inference process in most FSL methods, which hampers its application in some risk-sensitive areas. In this paper, we reveal a new way to perform FSL for image classification, using visual representations from the backbone model and weights generated by a newly-emerged explainable classifier. The weighted representations only include a minimum number of distinguishable features and the visualized weights can serve as an informative hint for the FSL process. Finally, a discriminator will compare the representations of each pair of the images in the support set and the query set. Pairs with the highest scores will decide the classification results. Experimental results prove that the proposed method can achieve both good accuracy and satisfactory explainability on three mainstream datasets. △ Less

Submitted 25 November, 2020; originally announced November 2020.

arXiv:2011.03772 [pdf, other]

Automated Grading System of Retinal Arterio-venous Crossing Patterns: A Deep Learning Approach Replicating Ophthalmologist's Diagnostic Process of Arteriolosclerosis

Authors: Liangzhi Li, Manisha Verma, Bowen Wang, Yuta Nakashima, Hajime Nagahara, Ryo Kawasaki

Abstract: The status of retinal arteriovenous crossing is of great significance for clinical evaluation of arteriolosclerosis and systemic hypertension. As an ophthalmology diagnostic criteria, Scheie's classification has been used to grade the severity of arteriolosclerosis. In this paper, we propose a deep learning approach to support the diagnosis process, which, to the best of our knowledge, is one of t… ▽ More The status of retinal arteriovenous crossing is of great significance for clinical evaluation of arteriolosclerosis and systemic hypertension. As an ophthalmology diagnostic criteria, Scheie's classification has been used to grade the severity of arteriolosclerosis. In this paper, we propose a deep learning approach to support the diagnosis process, which, to the best of our knowledge, is one of the earliest attempts in medical imaging. The proposed pipeline is three-fold. First, we adopt segmentation and classification models to automatically obtain vessels in a retinal image with the corresponding artery/vein labels and find candidate arteriovenous crossing points. Second, we use a classification model to validate the true crossing point. At last, the grade of severity for the vessel crossings is classified. To better address the problem of label ambiguity and imbalanced label distribution, we propose a new model, named multi-diagnosis team network (MDTNet), in which the sub-models with different structures or different loss functions provide different decisions. MDTNet unifies these diverse theories to give the final decision with high accuracy. Our severity grading method was able to validate crossing points with precision and recall of 96.3% and 96.3%, respectively. Among correctly detected crossing points, the kappa value for the agreement between the grading by a retina specialist and the estimated score was 0.85, with an accuracy of 0.92. The numerical results demonstrate that our method can achieve a good performance in both arteriovenous crossing validation and severity grading tasks. By the proposed models, we could build a pipeline reproducing retina specialist's subjective grading without feature extractions. The code is available for reproducibility. △ Less

Submitted 1 December, 2022; v1 submitted 7 November, 2020; originally announced November 2020.

Comments: Accepted in PLOS Digital Health

arXiv:2010.09466 [pdf, other]

Noisy-LSTM: Improving Temporal Awareness for Video Semantic Segmentation

Authors: Bowen Wang, Liangzhi Li, Yuta Nakashima, Ryo Kawasaki, Hajime Nagahara, Yasushi Yagi

Abstract: Semantic video segmentation is a key challenge for various applications. This paper presents a new model named Noisy-LSTM, which is trainable in an end-to-end manner, with convolutional LSTMs (ConvLSTMs) to leverage the temporal coherency in video frames. We also present a simple yet effective training strategy, which replaces a frame in video sequence with noises. This strategy spoils the tempora… ▽ More Semantic video segmentation is a key challenge for various applications. This paper presents a new model named Noisy-LSTM, which is trainable in an end-to-end manner, with convolutional LSTMs (ConvLSTMs) to leverage the temporal coherency in video frames. We also present a simple yet effective training strategy, which replaces a frame in video sequence with noises. This strategy spoils the temporal coherency in video frames during training and thus makes the temporal links in ConvLSTMs unreliable, which may consequently improve feature extraction from video frames, as well as serve as a regularizer to avoid overfitting, without requiring extra data annotation or computational costs. Experimental results demonstrate that the proposed model can achieve state-of-the-art performances in both the CityScapes and EndoVis2018 datasets. △ Less

Submitted 19 October, 2020; originally announced October 2020.

arXiv:2010.05185 [pdf, other]

Constructing a Visual Relationship Authenticity Dataset

Authors: Chenhui Chu, Yuto Takebayashi, Mishra Vipul, Yuta Nakashima

Abstract: A visual relationship denotes a relationship between two objects in an image, which can be represented as a triplet of (subject; predicate; object). Visual relationship detection is crucial for scene understanding in images. Existing visual relationship detection datasets only contain true relationships that correctly describe the content in an image. However, distinguishing false visual relations… ▽ More A visual relationship denotes a relationship between two objects in an image, which can be represented as a triplet of (subject; predicate; object). Visual relationship detection is crucial for scene understanding in images. Existing visual relationship detection datasets only contain true relationships that correctly describe the content in an image. However, distinguishing false visual relationships from true ones is also crucial for image understanding and grounded natural language processing. In this paper, we construct a visual relationship authenticity dataset, where both true and false relationships among all objects appeared in the captions in the Flickr30k entities image caption dataset are annotated. The dataset is available at https://github.com/codecreator2053/VR_ClassifiedDataset. We hope that this dataset can promote the study on both vision and language understanding. △ Less

Submitted 11 October, 2020; originally announced October 2020.

arXiv:2009.14545 [pdf, other]

Demographic Influences on Contemporary Art with Unsupervised Style Embeddings

Authors: Nikolai Huckle, Noa Garcia, Yuta Nakashima

Abstract: Computational art analysis has, through its reliance on classification tasks, prioritised historical datasets in which the artworks are already well sorted with the necessary annotations. Art produced today, on the other hand, is numerous and easily accessible, through the internet and social networks that are used by professional and amateur artists alike to display their work. Although this art,… ▽ More Computational art analysis has, through its reliance on classification tasks, prioritised historical datasets in which the artworks are already well sorted with the necessary annotations. Art produced today, on the other hand, is numerous and easily accessible, through the internet and social networks that are used by professional and amateur artists alike to display their work. Although this art, yet unsorted in terms of style and genre, is less suited for supervised analysis, the data sources come with novel information that may help frame the visual content in equally novel ways. As a first step in this direction, we present contempArt, a multi-modal dataset of exclusively contemporary artworks. contempArt is a collection of paintings and drawings, a detailed graph network based on social connections on Instagram and additional socio-demographic information; all attached to 442 artists at the beginning of their career. We evaluate three methods suited for generating unsupervised style embeddings of images and correlate them with the remaining data. We find no connections between visual style on the one hand and social proximity, gender, and nationality on the other. △ Less

Submitted 1 December, 2020; v1 submitted 30 September, 2020; originally announced September 2020.

Comments: To be published in Proceedings of the European Conference in Computer Vision Workshops 2020

arXiv:2009.06138 [pdf, other]

SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

Authors: Liangzhi Li, Bowen Wang, Manisha Verma, Yuta Nakashima, Ryo Kawasaki, Hajime Nagahara

Abstract: Explainable artificial intelligence has been gaining attention in the past few years. However, most existing methods are based on gradients or intermediate features, which are not directly involved in the decision-making process of the classifier. In this paper, we propose a slot attention-based classifier called SCOUTER for transparent yet accurate classification. Two major differences from other… ▽ More Explainable artificial intelligence has been gaining attention in the past few years. However, most existing methods are based on gradients or intermediate features, which are not directly involved in the decision-making process of the classifier. In this paper, we propose a slot attention-based classifier called SCOUTER for transparent yet accurate classification. Two major differences from other attention-based methods include: (a) SCOUTER's explanation is involved in the final confidence for each category, offering more intuitive interpretation, and (b) all the categories have their corresponding positive or negative explanation, which tells "why the image is of a certain category" or "why the image is not of a certain category." We design a new loss tailored for SCOUTER that controls the model's behavior to switch between positive and negative explanations, as well as the size of explanatory regions. Experimental results show that SCOUTER can give better visual explanations in terms of various metrics while kee** good accuracy on small and medium-sized datasets. △ Less

Submitted 20 August, 2021; v1 submitted 13 September, 2020; originally announced September 2020.

arXiv:2009.00325 [pdf, other]

Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

Authors: Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä

Abstract: The query-based moment retrieval is a problem of localising a specific clip from an untrimmed video according a query sentence. This is a challenging task that requires interpretation of both the natural language query and the video content. Like in many other areas in computer vision and machine learning, the progress in query-based moment retrieval is heavily driven by the benchmark datasets and… ▽ More The query-based moment retrieval is a problem of localising a specific clip from an untrimmed video according a query sentence. This is a challenging task that requires interpretation of both the natural language query and the video content. Like in many other areas in computer vision and machine learning, the progress in query-based moment retrieval is heavily driven by the benchmark datasets and, therefore, their quality has significant impact on the field. In this paper, we present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task. Our results indicate substantial biases in the popular datasets and unexpected behaviour of the state-of-the-art models. Moreover, we present new sanity check experiments and approaches for visualising the results. Finally, we suggest possible directions to improve the temporal sentence grounding in the future. Our code for this paper is available at https://mayu-ot.github.io/hidden-challenges-MR . △ Less

Submitted 7 October, 2020; v1 submitted 1 September, 2020; originally announced September 2020.

Comments: British Machine Vision Conference (BMVC), 2020. (v2) added references

arXiv:2008.12520 [pdf, other]

A Dataset and Baselines for Visual Question Answering on Art

Authors: Noa Garcia, Chentao Ye, Zihua Liu, Qingtao Hu, Mayu Otani, Chenhui Chu, Yuta Nakashima, Teruko Mitamura

Abstract: Answering questions related to art pieces (paintings) is a difficult task, as it implies the understanding of not only the visual information that is shown in the picture, but also the contextual knowledge that is acquired through the study of the history of art. In this work, we introduce our first attempt towards building a new dataset, coined AQUA (Art QUestion Answering). The question-answer (… ▽ More Answering questions related to art pieces (paintings) is a difficult task, as it implies the understanding of not only the visual information that is shown in the picture, but also the contextual knowledge that is acquired through the study of the history of art. In this work, we introduce our first attempt towards building a new dataset, coined AQUA (Art QUestion Answering). The question-answer (QA) pairs are automatically generated using state-of-the-art question generation methods based on paintings and comments provided in an existing art understanding dataset. The QA pairs are cleansed by crowdsourcing workers with respect to their grammatical correctness, answerability, and answers' correctness. Our dataset inherently consists of visual (painting-based) and knowledge (comment-based) questions. We also present a two-branch model as baseline, where the visual and knowledge questions are handled independently. We extensively compare our baseline model against the state-of-the-art models for question answering, and we provide a comprehensive study about the challenges and potential future directions for visual question answering on art. △ Less

Submitted 28 August, 2020; originally announced August 2020.

arXiv:2007.11365 [pdf, other]

doi 10.1109/TPAMI.2021.3076522

Depthwise Spatio-Temporal STFT Convolutional Neural Networks for Human Action Recognition

Authors: Sudhakar Kumawat, Manisha Verma, Yuta Nakashima, Shanmuganathan Raman

Abstract: Conventional 3D convolutional neural networks (CNNs) are computationally expensive, memory intensive, prone to overfitting, and most importantly, there is a need to improve their feature learning capabilities. To address these issues, we propose spatio-temporal short term Fourier transform (STFT) blocks, a new class of convolutional blocks that can serve as an alternative to the 3D convolutional l… ▽ More Conventional 3D convolutional neural networks (CNNs) are computationally expensive, memory intensive, prone to overfitting, and most importantly, there is a need to improve their feature learning capabilities. To address these issues, we propose spatio-temporal short term Fourier transform (STFT) blocks, a new class of convolutional blocks that can serve as an alternative to the 3D convolutional layer and its variants in 3D CNNs. An STFT block consists of non-trainable convolution layers that capture spatially and/or temporally local Fourier information using a STFT kernel at multiple low frequency points, followed by a set of trainable linear weights for learning channel correlations. The STFT blocks significantly reduce the space-time complexity in 3D CNNs. In general, they use 3.5 to 4.5 times less parameters and 1.5 to 1.8 times less computational costs when compared to the state-of-the-art methods. Furthermore, their feature learning capabilities are significantly better than the conventional 3D convolutional layer and its variants. Our extensive evaluation on seven action recognition datasets, including Something-something v1 and v2, Jester, Diving-48, Kinetics-400, UCF 101, and HMDB 51, demonstrate that STFT blocks based 3D CNNs achieve on par or even better performance compared to the state-of-the-art methods. △ Less

Submitted 22 July, 2020; originally announced July 2020.

Comments: Extended version of our CVPR 2019 work

arXiv:2007.08751 [pdf, other]

Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions

Authors: Noa Garcia, Yuta Nakashima

Abstract: To understand movies, humans constantly reason over the dialogues and actions shown in specific scenes and relate them to the overall storyline already seen. Inspired by this behaviour, we design ROLL, a model for knowledge-based video story question answering that leverages three crucial aspects of movie understanding: dialog comprehension, scene reasoning, and storyline recalling. In ROLL, each… ▽ More To understand movies, humans constantly reason over the dialogues and actions shown in specific scenes and relate them to the overall storyline already seen. Inspired by this behaviour, we design ROLL, a model for knowledge-based video story question answering that leverages three crucial aspects of movie understanding: dialog comprehension, scene reasoning, and storyline recalling. In ROLL, each of these tasks is in charge of extracting rich and diverse information by 1) processing scene dialogues, 2) generating unsupervised video scene descriptions, and 3) obtaining external knowledge in a weakly supervised fashion. To answer a given question correctly, the information generated by each inspired-cognitive task is encoded via Transformers and fused through a modality weighting mechanism, which balances the information from the different sources. Exhaustive evaluation demonstrates the effectiveness of our approach, which yields a new state-of-the-art on two challenging video question answering datasets: KnowIT VQA and TVQA+. △ Less

Submitted 17 July, 2020; originally announced July 2020.

arXiv:2006.13576 [pdf, other]

Lyndon Words, the Three Squares Lemma, and Primitive Squares

Authors: Hideo Bannai, Takuya Mieno, Yuto Nakashima

Abstract: We revisit the so-called "Three Squares Lemma" by Crochemore and Rytter [Algorithmica 1995] and, using arguments based on Lyndon words, derive a more general variant which considers three overlap** squares which do not necessarily share a common prefix. We also give an improved upper bound of $n\log_2 n$ on the maximum number of (occurrences of) primitively rooted squares in a string of length… ▽ More We revisit the so-called "Three Squares Lemma" by Crochemore and Rytter [Algorithmica 1995] and, using arguments based on Lyndon words, derive a more general variant which considers three overlap** squares which do not necessarily share a common prefix. We also give an improved upper bound of $n\log_2 n$ on the maximum number of (occurrences of) primitively rooted squares in a string of length $n$, also using arguments based on Lyndon words. To the best of our knowledge, the only known upper bound was $n \log_φn \approx 1.441n\log_2 n$, where $φ$ is the golden ratio, reported by Fraenkel and Simpson [TCS 1999] obtained via the Three Squares Lemma. △ Less

Submitted 22 July, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

arXiv:2006.02134 [pdf, other]

Palindromic Trees for a Sliding Window and Its Applications

Authors: Takuya Mieno, Kiichi Watanabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: The palindromic tree (a.k.a. eertree) for a string $S$ of length $n$ is a tree-like data structure that represents the set of all distinct palindromic substrings of $S$, using $O(n)$ space [Rubinchik and Shur, 2018]. It is known that, when $S$ is over an alphabet of size $σ$ and is given in an online manner, then the palindromic tree of $S$ can be constructed in $O(n\logσ)$ time with $O(n)$ space.… ▽ More The palindromic tree (a.k.a. eertree) for a string $S$ of length $n$ is a tree-like data structure that represents the set of all distinct palindromic substrings of $S$, using $O(n)$ space [Rubinchik and Shur, 2018]. It is known that, when $S$ is over an alphabet of size $σ$ and is given in an online manner, then the palindromic tree of $S$ can be constructed in $O(n\logσ)$ time with $O(n)$ space. In this paper, we consider the sliding window version of the problem: For a sliding window of length at most $d$, we present two versions of an algorithm which maintains the palindromic tree of size $O(d)$ for every sliding window $S[i..j]$ over $S$, where $1 \leq j-i+1 \leq d$. The first version works in $O(n\logσ')$ time with $O(d)$ space where $σ' \leq d$ is the maximum number of distinct characters in the windows, and the second one works in $O(n + dσ)$ time with $(d+2)σ+ O(d)$ space. We also show how our algorithms can be applied to efficient computation of minimal unique palindromic substrings (MUPS) and minimal absent palindromic words (MAPW) for a sliding window. △ Less

Submitted 11 November, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

arXiv:2005.13337 [pdf, other]

Joint Learning of Vessel Segmentation and Artery/Vein Classification with Post-processing

Authors: Liangzhi Li, Manisha Verma, Yuta Nakashima, Ryo Kawasaki, Hajime Nagahara

Abstract: Retinal imaging serves as a valuable tool for diagnosis of various diseases. However, reading retinal images is a difficult and time-consuming task even for experienced specialists. The fundamental step towards automated retinal image analysis is vessel segmentation and artery/vein classification, which provide various information on potential disorders. To improve the performance of the existing… ▽ More Retinal imaging serves as a valuable tool for diagnosis of various diseases. However, reading retinal images is a difficult and time-consuming task even for experienced specialists. The fundamental step towards automated retinal image analysis is vessel segmentation and artery/vein classification, which provide various information on potential disorders. To improve the performance of the existing automated methods for retinal image analysis, we propose a two-step vessel classification. We adopt a UNet-based model, SeqNet, to accurately segment vessels from the background and make prediction on the vessel type. Our model does segmentation and classification sequentially, which alleviates the problem of label distribution bias and facilitates training. To further refine classification results, we post-process them considering the structural information among vessels to propagate highly confident prediction to surrounding vessels. Our experiments show that our method improves AUC to 0.98 for segmentation and the accuracy to 0.92 in classification over DRIVE dataset. △ Less

Submitted 27 May, 2020; originally announced May 2020.

Comments: Accepted in Medical Imaging with Deep Learning (MIDL) 2020

arXiv:2005.09524 [pdf, other]

On repetitiveness measures of Thue-Morse words

Authors: Kanaru Kutsukake, Takuya Matsumoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: We show that the size $γ(t_n)$ of the smallest string attractor of the $n$th Thue-Morse word $t_n$ is 4 for any $n\geq 4$, disproving the conjecture by Mantaci et al. [ICTCS 2019] that it is $n$. We also show that $δ(t_n) = \frac{10}{3+2^{4-n}}$ for $n \geq 3$, where $δ(w)$ is the maximum over all $k = 1,\ldots,|w|$, the number of distinct substrings of length $k$ in $w$ divided by $k$, which is a… ▽ More We show that the size $γ(t_n)$ of the smallest string attractor of the $n$th Thue-Morse word $t_n$ is 4 for any $n\geq 4$, disproving the conjecture by Mantaci et al. [ICTCS 2019] that it is $n$. We also show that $δ(t_n) = \frac{10}{3+2^{4-n}}$ for $n \geq 3$, where $δ(w)$ is the maximum over all $k = 1,\ldots,|w|$, the number of distinct substrings of length $k$ in $w$ divided by $k$, which is a measure of repetitiveness recently studied by Kociumaka et al. [LATIN 2020]. Furthermore, we show that the number $z(t_n)$ of factors in the self-referencing Lempel-Ziv factorization of $t_n$ is exactly $2n$. △ Less

Submitted 12 August, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

Comments: accepted to SPIRE 2020

arXiv:2005.08190 [pdf, other]

Towards Efficient Interactive Computation of Dynamic Time War** Distance

Authors: Akihiro Nishi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: The dynamic time war** (DTW) is a widely-used method that allows us to efficiently compare two time series that can vary in speed. Given two strings $A$ and $B$ of respective lengths $m$ and $n$, there is a fundamental dynamic programming algorithm that computes the DTW distance for $A$ and $B$ together with an optimal alignment in $Θ(mn)$ time and space. In this paper, we tackle the problem of… ▽ More The dynamic time war** (DTW) is a widely-used method that allows us to efficiently compare two time series that can vary in speed. Given two strings $A$ and $B$ of respective lengths $m$ and $n$, there is a fundamental dynamic programming algorithm that computes the DTW distance for $A$ and $B$ together with an optimal alignment in $Θ(mn)$ time and space. In this paper, we tackle the problem of interactive computation of the DTW distance for dynamic strings, denoted $\mathrm{D^2TW}$, where character-wise edit operation (insertion, deletion, substitution) can be performed at an arbitrary position of the strings. Let $M$ and $N$ be the sizes of the run-length encoding (RLE) of $A$ and $B$, respectively. We present an algorithm for $\mathrm{D^2TW}$ that occupies $Θ(mN+nM)$ space and uses $O(m+n+\#_{\mathrm{chg}}) \subseteq O(mN + nM)$ time to update a compact differential representation $\mathit{DS}$ of the DP table per edit operation, where $\#_{\mathrm{chg}}$ denotes the number of cells in $\mathit{DS}$ whose values change after the edit operation. Our method is at least as efficient as the algorithm recently proposed by Froese et al. running in $Θ(mN + nM)$ time, and is faster when $\#_{\mathrm{chg}}$ is smaller than $O(mN + nM)$ which, as our preliminary experiments suggest, is likely to be the case in the majority of instances. △ Less

Submitted 29 July, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

Comments: Accepted for SPIRE 2020

arXiv:2004.10362 [pdf, other]

Yoga-82: A New Dataset for Fine-grained Classification of Human Poses

Authors: Manisha Verma, Sudhakar Kumawat, Yuta Nakashima, Shanmuganathan Raman

Abstract: Human pose estimation is a well-known problem in computer vision to locate joint positions. Existing datasets for the learning of poses are observed to be not challenging enough in terms of pose diversity, object occlusion, and viewpoints. This makes the pose annotation process relatively simple and restricts the application of the models that have been trained on them. To handle more variety in h… ▽ More Human pose estimation is a well-known problem in computer vision to locate joint positions. Existing datasets for the learning of poses are observed to be not challenging enough in terms of pose diversity, object occlusion, and viewpoints. This makes the pose annotation process relatively simple and restricts the application of the models that have been trained on them. To handle more variety in human poses, we propose the concept of fine-grained hierarchical pose classification, in which we formulate the pose estimation as a classification task, and propose a dataset, Yoga-82, for large-scale yoga pose recognition with 82 classes. Yoga-82 consists of complex poses where fine annotations may not be possible. To resolve this, we provide hierarchical labels for yoga poses based on the body configuration of the pose. The dataset contains a three-level hierarchy including body positions, variations in body positions, and the actual pose names. We present the classification accuracy of the state-of-the-art convolutional neural network architectures on Yoga-82. We also present several hierarchical variants of DenseNet in order to utilize the hierarchical labels. △ Less

Submitted 21 April, 2020; originally announced April 2020.

Comments: Accepted CVPR Workshops 2020

arXiv:2004.08385 [pdf, other]

Knowledge-Based Visual Question Answering in Videos

Authors: Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima

Abstract: We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions, which need of the experience obtained from the viewing of the serie… ▽ More We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions, which need of the experience obtained from the viewing of the series to be answered. Second, we propose a video understanding model by combining the visual and textual video content with specific knowledge about the show. Our main findings are: (i) the incorporation of knowledge produces outstanding improvements for VQA in video, and (ii) the performance on KnowIT VQA still lags well behind human accuracy, indicating its usefulness for studying current video modelling limitations. △ Less

Submitted 16 April, 2020; originally announced April 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1910.10706

arXiv:2004.05309 [pdf, other]

Grammar-compressed Self-index with Lyndon Words

Authors: Kazuya Tsuruta, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired by the Lyndon trees (Barcelo, 1990). Based on this SLP, we propose a self-index data structure of $O(g)$ words of space that can be built from a string $T$ in $O(n \lg n)$ expected time, retrieving the starting positions of all occurrences of a pattern $P$ of length $m$ in $O(m + \lg m \lg n + occ \lg g)$ tim… ▽ More We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired by the Lyndon trees (Barcelo, 1990). Based on this SLP, we propose a self-index data structure of $O(g)$ words of space that can be built from a string $T$ in $O(n \lg n)$ expected time, retrieving the starting positions of all occurrences of a pattern $P$ of length $m$ in $O(m + \lg m \lg n + occ \lg g)$ time, where $n$ is the length of $T$, $g$ is the size of the Lyndon SLP for $T$, and $occ$ is the number of occurrences of $P$ in $T$. △ Less

Submitted 27 April, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

arXiv:2002.06796 [pdf, other]

Detecting $k$-(Sub-)Cadences and Equidistant Subsequence Occurrences

Authors: Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Ayumi Shinohara

Abstract: The equidistant subsequence pattern matching problem is considered. Given a pattern string $P$ and a text string $T$, we say that $P$ is an \emph{equidistant subsequence} of $T$ if $P$ is a subsequence of the text such that consecutive symbols of $P$ in the occurrence are equally spaced. We can consider the problem of equidistant subsequences as generalizations of (sub-)cadences. We give bit-paral… ▽ More The equidistant subsequence pattern matching problem is considered. Given a pattern string $P$ and a text string $T$, we say that $P$ is an \emph{equidistant subsequence} of $T$ if $P$ is a subsequence of the text such that consecutive symbols of $P$ in the occurrence are equally spaced. We can consider the problem of equidistant subsequences as generalizations of (sub-)cadences. We give bit-parallel algorithms that yield $o(n^2)$ time algorithms for finding $k$-(sub-)cadences and equidistant subsequences. Furthermore, $O(n\log^2 n)$ and $O(n\log n)$ time algorithms, respectively for equidistant and Abelian equidistant matching for the case $|P| = 3$, are shown. The algorithms make use of a technique that was recently introduced which can efficiently compute convolutions with linear constraints. △ Less

Submitted 17 February, 2020; originally announced February 2020.

arXiv:2002.06786 [pdf, other]

doi 10.1016/j.tcs.2022.09.008

Parameterized DAWGs: efficient constructions and bidirectional pattern searches

Authors: Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, Masayuki Takeda

Abstract: Two strings $x$ and $y$ over $Σ\cup Π$ of equal length are said to \emph{parameterized match} (\emph{p-match}) if there is a renaming bijection $f:Σ\cup Π\rightarrow Σ\cup Π$ that is identity on $Σ$ and transforms $x$ to $y$ (or vice versa). The \emph{p-matching} problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose \emph{parameterized suffix automata}… ▽ More Two strings $x$ and $y$ over $Σ\cup Π$ of equal length are said to \emph{parameterized match} (\emph{p-match}) if there is a renaming bijection $f:Σ\cup Π\rightarrow Σ\cup Π$ that is identity on $Σ$ and transforms $x$ to $y$ (or vice versa). The \emph{p-matching} problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose \emph{parameterized suffix automata} (\emph{p-suffix automata}) and \emph{parameterized directed acyclic word graphs} (\emph{PDAWGs}) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have $Θ(n^2)$ nodes and edges but PDAWGs have only $O(n)$ nodes and edges, where $n$ is the length of an input string. We also give an $O(n |Π| \log (|Π| + |Σ|))$-time $O(n)$-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the \emph{parameterized suffix tree} for the reversed string can also be built in the same time and space, in a right-to-left online manner. This duality also leads us to two further efficient algorithms for p-matching: Given the parameterized suffix tree for the reversal of the input string $T$, one can build the PDAWG of $T$ in $O(n)$ time in an offline manner; One can perform \emph{bidirectional} p-matching in $O(m \log (|Π|+|Σ|) + \mathit{occ})$ time using $O(n)$ space, where $m$ denotes the pattern length and $\mathit{occ}$ is the number of pattern occurrences in the text $T$. △ Less

Submitted 16 September, 2022; v1 submitted 17 February, 2020; originally announced February 2020.

Comments: 28 pages, 7 figures

Journal ref: Theoretical Computer Science (2022)

arXiv:2001.05671 [pdf, ps, other]

Faster STR-EC-LCS Computation

Authors: Kohei Yamada, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: The longest common subsequence (LCS) problem is a central problem in stringology that finds the longest common subsequence of given two strings $A$ and $B$. More recently, a set of four constrained LCS problems (called generalized constrained LCS problem) were proposed by Chen and Chao [J. Comb. Optim, 2011]. In this paper, we consider the substring-excluding constrained LCS (STR-EC-LCS) problem.… ▽ More The longest common subsequence (LCS) problem is a central problem in stringology that finds the longest common subsequence of given two strings $A$ and $B$. More recently, a set of four constrained LCS problems (called generalized constrained LCS problem) were proposed by Chen and Chao [J. Comb. Optim, 2011]. In this paper, we consider the substring-excluding constrained LCS (STR-EC-LCS) problem. A string $Z$ is said to be an STR-EC-LCS of two given strings $A$ and $B$ excluding $P$ if, $Z$ is one of the longest common subsequences of $A$ and $B$ that does not contain $P$ as a substring. Wang et al. proposed a dynamic programming solution which computes an STR-EC-LCS in $O(mnr)$ time and space where $m = |A|, n = |B|, r = |P|$ [Inf. Process. Lett., 2013]. In this paper, we show a new solution for the STR-EC-LCS problem. Our algorithm computes an STR-EC-LCS in $O(n|Σ| + (L+1)(m-L+1)r)$ time where $|Σ| \leq \min\{m, n\}$ denotes the set of distinct characters occurring in both $A$ and $B$, and $L$ is the length of the STR-EC-LCS. This algorithm is faster than the $O(mnr)$-time algorithm for short/long STR-EC-LCS (namely, $L \in O(1)$ or $m-L \in O(1)$), and is at least as efficient as the $O(mnr)$-time algorithm for all cases. △ Less

Submitted 16 January, 2020; originally announced January 2020.

arXiv:1912.05763 [pdf, other]

IterNet: Retinal Image Segmentation Utilizing Structural Redundancy in Vessel Networks

Authors: Liangzhi Li, Manisha Verma, Yuta Nakashima, Hajime Nagahara, Ryo Kawasaki

Abstract: Retinal vessel segmentation is of great interest for diagnosis of retinal vascular diseases. To further improve the performance of vessel segmentation, we propose IterNet, a new model based on UNet, with the ability to find obscured details of the vessel from the segmented vessel image itself, rather than the raw input image. IterNet consists of multiple iterations of a mini-UNet, which can be 4… ▽ More Retinal vessel segmentation is of great interest for diagnosis of retinal vascular diseases. To further improve the performance of vessel segmentation, we propose IterNet, a new model based on UNet, with the ability to find obscured details of the vessel from the segmented vessel image itself, rather than the raw input image. IterNet consists of multiple iterations of a mini-UNet, which can be 4$\times$ deeper than the common UNet. IterNet also adopts the weight-sharing and skip-connection features to facilitate training; therefore, even with such a large architecture, IterNet can still learn from merely 10$\sim$20 labeled images, without pre-training or any prior knowledge. IterNet achieves AUCs of 0.9816, 0.9851, and 0.9881 on three mainstream datasets, namely DRIVE, CHASE-DB1, and STARE, respectively, which currently are the best scores in the literature. The source code is available. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: Accepted in 2020 Winter Conference on Applications of Computer Vision (WACV 20)

arXiv:1910.10706 [pdf, other]

KnowIT VQA: Answering Knowledge-Based Questions about Videos

Authors: Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima

Abstract: We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions, which need of the experience obtained from the viewing of the serie… ▽ More We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions, which need of the experience obtained from the viewing of the series to be answered. Second, we propose a video understanding model by combining the visual and textual video content with specific knowledge about the show. Our main findings are: (i) the incorporation of knowledge produces outstanding improvements for VQA in video, and (ii) the performance on KnowIT VQA still lags well behind human accuracy, indicating its usefulness for studying current video modelling limitations. △ Less

Submitted 23 December, 2019; v1 submitted 22 October, 2019; originally announced October 2019.

arXiv:1909.12932 [pdf, other]

BUDA.ART: A Multimodal Content-Based Analysis and Retrieval System for Buddha Statues

Authors: Benjamin Renoust, Matheus Oliveira Franca, Jacob Chan, Van Le, Ayaka Uesaka, Yuta Nakashima, Hajime Nagahara, Jueren Wang, Yutaka Fujioka

Abstract: We introduce BUDA.ART, a system designed to assist researchers in Art History, to explore and analyze an archive of pictures of Buddha statues. The system combines different CBIR and classical retrieval techniques to assemble 2D pictures, 3D statue scans and meta-data, that is focused on the Buddha facial characteristics. We build the system from an archive of 50,000 Buddhism pictures, identify un… ▽ More We introduce BUDA.ART, a system designed to assist researchers in Art History, to explore and analyze an archive of pictures of Buddha statues. The system combines different CBIR and classical retrieval techniques to assemble 2D pictures, 3D statue scans and meta-data, that is focused on the Buddha facial characteristics. We build the system from an archive of 50,000 Buddhism pictures, identify unique Buddha statues, extract contextual information, and provide specific facial embedding to first index the archive. The system allows for mobile, on-site search, and to explore similarities of statues in the archive. In addition, we provide search visualization and 3D analysis of the statues △ Less

Submitted 17 September, 2019; originally announced September 2019.

Comments: Demo video at: https://www.youtube.com/watch?v=3XJvLjSWieY

arXiv:1909.12921 [pdf, other]

Historical and Modern Features for Buddha Statue Classification

Authors: Benjamin Renoust, Matheus Oliveira Franca, Jacob Chan, Noa Garcia, Van Le, Ayaka Uesaka, Yuta Nakashima, Hajime Nagahara, Jueren Wang, Yutaka Fujioka

Abstract: While Buddhism has spread along the Silk Roads, many pieces of art have been displaced. Only a few experts may identify these works, subjectively to their experience. The construction of Buddha statues was taught through the definition of canon rules, but the applications of those rules greatly varies across time and space. Automatic art analysis aims at supporting these challenges. We propose to… ▽ More While Buddhism has spread along the Silk Roads, many pieces of art have been displaced. Only a few experts may identify these works, subjectively to their experience. The construction of Buddha statues was taught through the definition of canon rules, but the applications of those rules greatly varies across time and space. Automatic art analysis aims at supporting these challenges. We propose to automatically recover the proportions induced by the construction guidelines, in order to use them and compare between different deep learning features for several classification tasks, in a medium size but rich dataset of Buddha statues, collected with experts of Buddhism art history. △ Less

Submitted 6 October, 2019; v1 submitted 17 September, 2019; originally announced September 2019.

arXiv:1909.02804 [pdf, ps, other]

Minimal Unique Substrings and Minimal Absent Words in a Sliding Window

Authors: Takuya Mieno, Yuki Kuhara, Tooru Akagi, Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: A substring $u$ of a string $T$ is called a minimal unique substring (MUS) of $T$ if $u$ occurs exactly once in $T$ and any proper substring of $u$ occurs at least twice in $T$. A string $w$ is called a minimal absent word (MAW) of $T$ if $w$ does not occur in $T$ and any proper substring of $w$ occurs in $T$. In this paper, we study the problems of computing MUSs and MAWs in a sliding window over… ▽ More A substring $u$ of a string $T$ is called a minimal unique substring (MUS) of $T$ if $u$ occurs exactly once in $T$ and any proper substring of $u$ occurs at least twice in $T$. A string $w$ is called a minimal absent word (MAW) of $T$ if $w$ does not occur in $T$ and any proper substring of $w$ occurs in $T$. In this paper, we study the problems of computing MUSs and MAWs in a sliding window over a given string $T$. We first show how the set of MUSs can change in a sliding window over $T$, and present an $O(n\logσ)$-time and $O(d)$-space algorithm to compute MUSs in a sliding window of width $d$ over $T$, where $σ$ is the maximum number of distinct characters in every window. We then give tight upper and lower bounds on the maximum number of changes in the set of MAWs in a sliding window over $T$. Our bounds improve on the previous results in [Crochemore et al., 2017]. △ Less

Submitted 13 September, 2019; v1 submitted 6 September, 2019; originally announced September 2019.

arXiv:1906.05486 [pdf, other]

On Longest Common Property Preserved Substring Queries

Authors: Kazuki Kai, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Tomasz Kociumaka

Abstract: We revisit the problem of longest common property preserving substring queries introduced by~Ayad et al. (SPIRE 2018, arXiv 2018). We consider a generalized and unified on-line setting, where we are given a set $X$ of $k$ strings of total length $n$ that can be pre-processed so that, given a query string $y$ and a positive integer $k'\leq k$, we can determine the longest substring of $y$ that sati… ▽ More We revisit the problem of longest common property preserving substring queries introduced by~Ayad et al. (SPIRE 2018, arXiv 2018). We consider a generalized and unified on-line setting, where we are given a set $X$ of $k$ strings of total length $n$ that can be pre-processed so that, given a query string $y$ and a positive integer $k'\leq k$, we can determine the longest substring of $y$ that satisfies some specific property and is common to at least $k'$ strings in $X$. Ayad et al. considered the longest square-free substring in an on-line setting and the longest periodic and palindromic substring in an off-line setting. In this paper, we give efficient solutions in the on-line setting for finding the longest common square, periodic, palindromic, and Lyndon substrings. More precisely, we show that $X$ can be pre-processed in $O(n)$ time resulting in a data structure of $O(n)$ size that answers queries in $O(|y|\logσ)$ time and $O(1)$ working space, where $σ$ is the size of the alphabet, and the common substring must be a square, a periodic substring, a palindrome, or a Lyndon word. △ Less

Submitted 13 June, 2019; originally announced June 2019.

Comments: minor change from version submitted to SPIRE 2019

arXiv:1906.00563 [pdf, other]

Direct Linear Time Construction of Parameterized Suffix and LCP Arrays for Constant Alphabets

Authors: Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: We present the first worst-case linear time algorithm that directly computes the parameterized suffix and LCP arrays for constant sized alphabets. Previous algorithms either required quadratic time or the parameterized suffix tree to be built first. More formally, for a string over static alphabet $Σ$ and parameterized alphabet $Π$, our algorithm runs in $O(nπ)$ time and $O(n)$ words of space, whe… ▽ More We present the first worst-case linear time algorithm that directly computes the parameterized suffix and LCP arrays for constant sized alphabets. Previous algorithms either required quadratic time or the parameterized suffix tree to be built first. More formally, for a string over static alphabet $Σ$ and parameterized alphabet $Π$, our algorithm runs in $O(nπ)$ time and $O(n)$ words of space, where $π$ is the number of distinct symbols of $Π$ in the string. △ Less

Submitted 3 June, 2019; originally announced June 2019.

Comments: submitted to SPIRE 2019

arXiv:1905.12854 [pdf, ps, other]

Space-Efficient Algorithms for Computing Minimal/Shortest Unique Substrings

Authors: Takuya Mieno, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: Given a string $T$ of length $n$, a substring $u = T[i..j]$ of $T$ is called a shortest unique substring (SUS) for an interval $[s,t]$ if (a) $u$ occurs exactly once in $T$, (b) $u$ contains the interval $[s,t]$ (i.e. $i \leq s \leq t \leq j$), and (c) every substring $v$ of $T$ with $|v| < |u|$ containing $[s,t]$ occurs at least twice in $T$. Given a query interval $[s, t] \subset [1, n]$, the in… ▽ More Given a string $T$ of length $n$, a substring $u = T[i..j]$ of $T$ is called a shortest unique substring (SUS) for an interval $[s,t]$ if (a) $u$ occurs exactly once in $T$, (b) $u$ contains the interval $[s,t]$ (i.e. $i \leq s \leq t \leq j$), and (c) every substring $v$ of $T$ with $|v| < |u|$ containing $[s,t]$ occurs at least twice in $T$. Given a query interval $[s, t] \subset [1, n]$, the interval SUS problem is to output all the SUSs for the interval $[s,t]$. In this article, we propose a $4n + o(n)$ bits data structure answering an interval SUS query in output-sensitive $O(\mathit{occ})$ time, where $\mathit{occ}$ is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for $s = t$. Here, we propose a $\lceil (\log_2{3} + 1)n \rceil + o(n)$ bits data structure answering a point SUS query in the same output-sensitive time. We also propose space-efficient algorithms for computing the minimal unique substrings of $T$. △ Less

Submitted 14 September, 2020; v1 submitted 30 May, 2019; originally announced May 2019.

arXiv:1905.05002 [pdf, other]

A Compact Low-Latency Systematic Successive Cancellation Polar Decoder for Visible Light Communication Systems

Authors: Duc-Phuc Nguyen, Dinh-Dung Le, Thi-Hong Tran, Takashi Nakada, Yasuhiko Nakashima

Abstract: Channel polarization and Polar code are widely considered as major breakthroughs in coding theory because they have shown promising features for future wireless standards. The main drawbacks of Polar code are high-latency in decoding hardware, and unimpressive error-correction performance in case limited code-length is implemented. These two disadvantages limit implementation of Polar code in low-… ▽ More Channel polarization and Polar code are widely considered as major breakthroughs in coding theory because they have shown promising features for future wireless standards. The main drawbacks of Polar code are high-latency in decoding hardware, and unimpressive error-correction performance in case limited code-length is implemented. These two disadvantages limit implementation of Polar code in low-throughput wireless communication systems. In this paper, we propose a low-complexity low-latency hardware architecture for the soft-decision compact (16,11) Systematic Successive Cancellation Polar Decoder (S-SCD). Experimental results has shown that the latency of the proposed S-SCD improves 3.75 times and 2.75 times compared with conventional and 2b-SC architectures. Besides, it has also shown a better BER/FER performance compared with RS(15,11) code, which is applied widely in current VLC-based systems. △ Less

Submitted 6 May, 2019; originally announced May 2019.

Comments: IEICE Technical Report, Vol.117, Issue 44, pp.3-7

arXiv:1904.10615 [pdf, other]

Understanding Art through Multi-Modal Retrieval in Paintings

Authors: Noa Garcia, Benjamin Renoust, Yuta Nakashima

Abstract: In computer vision, visual arts are often studied from a purely aesthetics perspective, mostly by analysing the visual appearance of an artistic reproduction to infer its style, its author, or its representative features. In this work, however, we explore art from both a visual and a language perspective. Our aim is to bridge the gap between the visual appearance of an artwork and its underlying m… ▽ More In computer vision, visual arts are often studied from a purely aesthetics perspective, mostly by analysing the visual appearance of an artistic reproduction to infer its style, its author, or its representative features. In this work, however, we explore art from both a visual and a language perspective. Our aim is to bridge the gap between the visual appearance of an artwork and its underlying meaning, by jointly analysing its aesthetics and its semantics. We introduce the use of multi-modal techniques in the field of automatic art analysis by 1) collecting a multi-modal dataset with fine-art paintings and comments, and 2) exploring robust visual and textual representations in artistic images. △ Less

Submitted 23 April, 2019; originally announced April 2019.

arXiv:1904.07467 [pdf, ps, other]

doi 10.1109/DCC47342.2020.00032

c-trie++: A Dynamic Trie Tailored for Fast Prefix Searches

Authors: Kazuya Tsuruta, Dominik Köppl, Shunsuke Kanda, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: Given a dynamic set $K$ of $k$ strings of total length $n$ whose characters are drawn from an alphabet of size $σ$, a keyword dictionary is a data structure built on $K$ that provides locate, prefix search, and update operations on $K$. Under the assumption that $α= w / \lg σ$ characters fit into a single machine word $w$, we propose a keyword dictionary that represents $K$ in… ▽ More Given a dynamic set $K$ of $k$ strings of total length $n$ whose characters are drawn from an alphabet of size $σ$, a keyword dictionary is a data structure built on $K$ that provides locate, prefix search, and update operations on $K$. Under the assumption that $α= w / \lg σ$ characters fit into a single machine word $w$, we propose a keyword dictionary that represents $K$ in $n \lg σ+ Θ(k \lg n)$ bits of space, supporting all operations in $O(m / α+ \lg α)$ expected time on an input string of length $m$ in the word RAM model. This data structure is underlined with an exhaustive practical evaluation, highlighting the practical usefulness of the proposed data structure, especially for prefix searches - one of the most elementary keyword dictionary operations. △ Less

Submitted 7 October, 2020; v1 submitted 16 April, 2019; originally announced April 2019.

Journal ref: Full version of conference paper at DCC, pages 243-252, 2020

arXiv:1904.04985 [pdf, other]

doi 10.1145/3323873.3325028

Context-Aware Embeddings for Automatic Art Analysis

Authors: Noa Garcia, Benjamin Renoust, Yuta Nakashima

Abstract: Automatic art analysis aims to classify and retrieve artistic representations from a collection of images by using computer vision and machine learning techniques. In this work, we propose to enhance visual representations from neural networks with contextual artistic information. Whereas visual representations are able to capture information about the content and the style of an artwork, our prop… ▽ More Automatic art analysis aims to classify and retrieve artistic representations from a collection of images by using computer vision and machine learning techniques. In this work, we propose to enhance visual representations from neural networks with contextual artistic information. Whereas visual representations are able to capture information about the content and the style of an artwork, our proposed context-aware embeddings additionally encode relationships between different artistic attributes, such as author, school, or historical period. We design two different approaches for using context in automatic art analysis. In the first one, contextual data is obtained through a multi-task learning model, in which several attributes are trained together to find visual relationships between elements. In the second approach, context is obtained through an art-specific knowledge graph, which encodes relationships between artistic attributes. An exhaustive evaluation of both of our models in several art analysis problems, such as author identification, type classification, or cross-modal retrieval, show that performance is improved by up to 7.3% in art classification and 37.24% in retrieval when context-aware embeddings are used. △ Less

Submitted 9 April, 2019; originally announced April 2019.

arXiv:1904.00832 [pdf, other]

Non-RLL DC-Balance based on a Pre-scrambled Polar Encoder for Beacon-based Visible Light Communication Systems

Authors: Duc-Phuc Nguyen, Dinh-Dung Le, Thi-Hong Tran, Yasuhiko Nakashima

Abstract: Current flicker mitigation (or DC-balance) solutions based on run-length limited (RLL) decoding algorithms are high in complexity, suffer from reduced code rates, or are limited in application to hard-decoding forward error correction (FEC) decoders. Fortunately, non-RLL DC-balance solutions can overcome the drawbacks of RLL-based algorithms, but they meet some difficulties in system latency, low… ▽ More Current flicker mitigation (or DC-balance) solutions based on run-length limited (RLL) decoding algorithms are high in complexity, suffer from reduced code rates, or are limited in application to hard-decoding forward error correction (FEC) decoders. Fortunately, non-RLL DC-balance solutions can overcome the drawbacks of RLL-based algorithms, but they meet some difficulties in system latency, low code rate or inferior error-correction performance. Recently, non-RLL flicker mitigation solution based on Polar code has proved to be a most optimal approach due to its natural equal probabilities of short runs of 1's and 0's with high error-correction performance. However, we found that this solution can only maintain DC balance only when the data frame length is sufficiently long. Therefore, these solutions are not suitable for using in beacon-based visible light communication (VLC) systems, which usually transmit ID information in small-size data frames. In this paper, we introduce a flicker mitigation solution designed for beacon-based VLC systems that combines a simple pre-scrambler with a (256;158) non-systematic polar encoder. △ Less

Submitted 29 March, 2019; originally announced April 2019.

Comments: to be published in Proceedings of ICEVLC (International Conference and Exhibition on Visible Light Communications). arXiv admin note: substantial text overlap with arXiv:1805.00359, arXiv:1805.03398

arXiv:1903.11328 [pdf, other]

Rethinking the Evaluation of Video Summaries

Authors: Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä

Abstract: Video summarization is a technique to create a short skim of the original video while preserving the main stories/content. There exists a substantial interest in automatizing this process due to the rapid growth of the available material. The recent progress has been facilitated by public benchmark datasets, which enable easy and fair comparison of methods. Currently the established evaluation pro… ▽ More Video summarization is a technique to create a short skim of the original video while preserving the main stories/content. There exists a substantial interest in automatizing this process due to the rapid growth of the available material. The recent progress has been facilitated by public benchmark datasets, which enable easy and fair comparison of methods. Currently the established evaluation protocol is to compare the generated summary with respect to a set of reference summaries provided by the dataset. In this paper, we will provide in-depth assessment of this pipeline using two popular benchmark datasets. Surprisingly, we observe that randomly generated summaries achieve comparable or better performance to the state-of-the-art. In some cases, the random summaries outperform even the human generated summaries in leave-one-out experiments. Moreover, it turns out that the video segmentation, which is often considered as a fixed pre-processing method, has the most significant impact on the performance measure. Based on our observations, we propose alternative approaches for assessing the importance scores as well as an intuitive visualization of correlation between the estimated scoring and human annotations. △ Less

Submitted 11 April, 2019; v1 submitted 27 March, 2019; originally announced March 2019.

Comments: CVPR'19 poster

arXiv:1903.06290 [pdf, other]

Fast Algorithms for the Shortest Unique Palindromic Substring Problem on Run-Length Encoded Strings

Authors: Kiichi Watanabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: For a string $S$, a palindromic substring $S[i..j]$ is said to be a \emph{shortest unique palindromic substring} ($\mathit{SUPS}$) for an interval $[s, t]$ in $S$, if $S[i..j]$ occurs exactly once in $S$, the interval $[i, j]$ contains $[s, t]$, and every palindromic substring containing $[s, t]$ which is shorter than $S[i..j]$ occurs at least twice in $S$. In this paper, we study the problem of a… ▽ More For a string $S$, a palindromic substring $S[i..j]$ is said to be a \emph{shortest unique palindromic substring} ($\mathit{SUPS}$) for an interval $[s, t]$ in $S$, if $S[i..j]$ occurs exactly once in $S$, the interval $[i, j]$ contains $[s, t]$, and every palindromic substring containing $[s, t]$ which is shorter than $S[i..j]$ occurs at least twice in $S$. In this paper, we study the problem of answering $\mathit{SUPS}$ queries on run-length encoded strings. We show how to preprocess a given run-length encoded string $\mathit{RLE}_{S}$ of size $m$ in $O(m)$ space and $O(m \log σ_{\mathit{RLE}_{S}} + m \sqrt{\log m / \log\log m})$ time so that all $\mathit{SUPSs}$ for any subsequent query interval can be answered in $O(\sqrt{\log m / \log\log m} + α)$ time, where $α$ is the number of outputs, and $σ_{\mathit{RLE}_{S}}$ is the number of distinct runs of $\mathit{RLE}_{S}$. Additionaly, we consider a variant of the SUPS problem where a query interval is also given in a run-length encoded form. For this variant of the problem, we present two alternative algorithms with faster queries. The first one answers queries in $O(\sqrt{\log\log m /\log\log\log m} + α)$ time and can be built in $O(m \log σ_{\mathit{RLE}_{S}} + m \sqrt{\log m / \log\log m})$ time, and the second one answers queries in $O(\log \log m + α)$ time and can be built in $O(m \log σ_{\mathit{RLE}_{S}})$ time. Both of these data structures require $O(m)$ space. △ Less

Submitted 23 March, 2020; v1 submitted 14 March, 2019; originally announced March 2019.

arXiv:1903.06289 [pdf, ps, other]

The Parameterized Position Heap of a Trie

Authors: Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: Let $Σ$ and $Π$ be disjoint alphabets of respective size $σ$ and $π$. Two strings over $Σ\cup Π$ of equal length are said to parameterized match (p-match) if there is a bijection $f:Σ\cup Π\rightarrow Σ\cup Π$ such that (1) $f$ is identity on $Σ$ and (2) $f$ maps the characters of one string to those of the other string so that the two strings become identical. We consider the p-matching problem o… ▽ More Let $Σ$ and $Π$ be disjoint alphabets of respective size $σ$ and $π$. Two strings over $Σ\cup Π$ of equal length are said to parameterized match (p-match) if there is a bijection $f:Σ\cup Π\rightarrow Σ\cup Π$ such that (1) $f$ is identity on $Σ$ and (2) $f$ maps the characters of one string to those of the other string so that the two strings become identical. We consider the p-matching problem on a (reversed) trie $\mathcal{T}$ and a string pattern $P$ such that every path that p-matches $P$ has to be reported. Let $N$ be the size of the given trie $\mathcal{T}$. In this paper, we propose the parameterized position heap for $\mathcal{T}$ that occupies $O(N)$ space and supports p-matching queries in $O(m \log (σ+ π) + m π+ \mathit{pocc}))$ time, where $m$ is the length of a query pattern $P$ and $\mathit{pocc}$ is the number of paths in $\mathcal{T}$ to report. We also present an algorithm which constructs the parameterized position heap for a given trie $\mathcal{T}$ in $O(N (σ+ π))$ time and working space. △ Less

Submitted 14 March, 2019; originally announced March 2019.

arXiv:1901.10722 [pdf, ps, other]

Computing longest palindromic substring after single-character or block-wise edits

Authors: Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. It is known that the length of the longest palindromic substrings (LPSs) of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. In this paper, we consider the problem of finding the LPS after the string is… ▽ More Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. It is known that the length of the longest palindromic substrings (LPSs) of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. In this paper, we consider the problem of finding the LPS after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LPSs in O(\log (\min \{σ, \log n\})) time after a single character substitution, insertion, or deletion, where σdenotes the number of distinct characters appearing in T. We also propose an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LPSs in O(\ell + \log \log n) time, after an existing substring in T is replaced by a string of arbitrary length \ell. △ Less

Submitted 8 January, 2021; v1 submitted 30 January, 2019; originally announced January 2019.

arXiv:1901.10633 [pdf, other]

Efficiently computing runs on a trie

Authors: Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: A maximal repetition, or run, in a string, is a maximal periodic substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a rooted edge-labeled tree where the endpoints of the path must be a descendant/ancestor of the other. For a trie with $n$ edges, we show that the number of runs is le… ▽ More A maximal repetition, or run, in a string, is a maximal periodic substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a rooted edge-labeled tree where the endpoints of the path must be a descendant/ancestor of the other. For a trie with $n$ edges, we show that the number of runs is less than $n$. We also show an asymptotic lower bound on the maximum density of runs in tries: $\lim_{n\rightarrow\infty}ρ_\mathcal{T}(n)/n \geq 0.993238$ where $ρ_{\mathcal{T}}(n)$ is the maximum number of runs in a trie with $n$ edges. Furthermore, we also show an $O(n\log \log n)$ time and $O(n)$ space algorithm for finding all runs. △ Less

Submitted 20 April, 2021; v1 submitted 29 January, 2019; originally announced January 2019.

Comments: an updated version of CPM 2019 paper (10.4230/LIPIcs.CPM.2019.23), submitted to a journal

arXiv:1811.04596 [pdf, other]

MR-RePair: Grammar Compression based on Maximal Repeats

Authors: Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida

Abstract: We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at… ▽ More We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpus. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts. △ Less

Submitted 18 February, 2019; v1 submitted 12 November, 2018; originally announced November 2018.

arXiv:1808.01071 [pdf, ps, other]

Right-to-left online construction of parameterized position heaps

Authors: Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: Two strings of equal length are said to parameterized match if there is a bijection that maps the characters of one string to those of the other string, so that two strings become identical. The parameterized pattern matching problem is, given two strings $T$ and $P$, to find the occurrences of substrings in $T$ that parameterized match $P$. Diptarama et al. [Position Heaps for Parameterized Strin… ▽ More Two strings of equal length are said to parameterized match if there is a bijection that maps the characters of one string to those of the other string, so that two strings become identical. The parameterized pattern matching problem is, given two strings $T$ and $P$, to find the occurrences of substrings in $T$ that parameterized match $P$. Diptarama et al. [Position Heaps for Parameterized Strings, CPM 2017] proposed an indexing data structure called parameterized position heaps, and gave a left-to-right online construction algorithm. In this paper, we present a right-to-left online construction algorithm for parameterized position heaps. For a text string $T$ of length $n$ over two kinds of alphabets $Σ$ and $Π$ of respective size $σ$ and $π$, our construction algorithm runs in $O(n \log(σ+ π))$ time with $O(n)$ space. Our right-to-left parameterized position heaps support pattern matching queries in $O(m \log (σ+ π) + m π+ \mathit{pocc}))$ time, where $m$ is the length of a query pattern $P$ and $\mathit{pocc}$ is the number of occurrences to report. Our construction and pattern matching algorithms are as efficient as Diptarama et al.'s algorithms. △ Less

Submitted 2 August, 2018; originally announced August 2018.

arXiv:1807.02632 [pdf, other]

Representing a Partially Observed Non-Rigid 3D Human Using Eigen-Texture and Eigen-Deformation

Authors: Ryosuke Kimura, Akihiko Sayo, Fabian Lorenzo Dayrit, Yuta Nakashima, Hiroshi Kawasaki, Ambrosio Blanco, Katsushi Ikeuchi

Abstract: Reconstruction of the shape and motion of humans from RGB-D is a challenging problem, receiving much attention in recent years. Recent approaches for full-body reconstruction use a statistic shape model, which is built upon accurate full-body scans of people in skin-tight clothes, to complete invisible parts due to occlusion. Such a statistic model may still be fit to an RGB-D measurement with loo… ▽ More Reconstruction of the shape and motion of humans from RGB-D is a challenging problem, receiving much attention in recent years. Recent approaches for full-body reconstruction use a statistic shape model, which is built upon accurate full-body scans of people in skin-tight clothes, to complete invisible parts due to occlusion. Such a statistic model may still be fit to an RGB-D measurement with loose clothes but cannot describe its deformations, such as clothing wrinkles. Observed surfaces may be reconstructed precisely from actual measurements, while we have no cues for unobserved surfaces. For full-body reconstruction with loose clothes, we propose to use lower dimensional embeddings of texture and deformation referred to as eigen-texturing and eigen-deformation, to reproduce views of even unobserved surfaces. Provided a full-body reconstruction from a sequence of partial measurements as 3D meshes, the texture and deformation of each triangle are then embedded using eigen-decomposition. Combined with neural-network-based coefficient regression, our method synthesizes the texture and deformation from arbitrary viewpoints. We evaluate our method using simulated data and visually demonstrate how our method works on real data. △ Less

Submitted 7 July, 2018; originally announced July 2018.

Comments: 6pages, accepted to ICPR

arXiv:1806.04890 [pdf, ps, other]

$O(n \log n)$-time text compression by LZ-style longest first substitution

Authors: Akihiro Nishi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Abstract: Mauer et al. [A Lempel-Ziv-style Compression Method for Repetitive Texts, PSC 2017] proposed a hybrid text compression method called LZ-LFS which has both features of Lempel-Ziv 77 factorization and longest first substitution. They showed that LZ-LFS can achieve better compression ratio for repetitive texts, compared to some state-of-the-art compression algorithms. The drawback of Mauer et al.'s m… ▽ More Mauer et al. [A Lempel-Ziv-style Compression Method for Repetitive Texts, PSC 2017] proposed a hybrid text compression method called LZ-LFS which has both features of Lempel-Ziv 77 factorization and longest first substitution. They showed that LZ-LFS can achieve better compression ratio for repetitive texts, compared to some state-of-the-art compression algorithms. The drawback of Mauer et al.'s method is that their LZ-LFS compression algorithm takes $O(n^2)$ time on an input string of length $n$. In this paper, we show a faster LZ-LFS compression algorithm that works in $O(n \log n)$ time. We also propose a simpler version of LZ-LFS that can be computed in $O(n)$ time. △ Less

Submitted 13 June, 2018; originally announced June 2018.

arXiv:1806.04284 [pdf, other]

iParaphrasing: Extracting Visually Grounded Paraphrases via an Image

Authors: Chenhui Chu, Mayu Otani, Yuta Nakashima

Abstract: A paraphrase is a restatement of the meaning of a text in other words. Paraphrases have been studied to enhance the performance of many natural language processing tasks. In this paper, we propose a novel task iParaphrasing to extract visually grounded paraphrases (VGPs), which are different phrasal expressions describing the same visual concept in an image. These extracted VGPs have the potential… ▽ More A paraphrase is a restatement of the meaning of a text in other words. Paraphrases have been studied to enhance the performance of many natural language processing tasks. In this paper, we propose a novel task iParaphrasing to extract visually grounded paraphrases (VGPs), which are different phrasal expressions describing the same visual concept in an image. These extracted VGPs have the potential to improve language and image multimodal tasks such as visual question answering and image captioning. How to model the similarity between VGPs is the key of iParaphrasing. We apply various existing methods as well as propose a novel neural network-based method with image attention, and report the results of the first attempt toward iParaphrasing. △ Less

Submitted 11 June, 2018; originally announced June 2018.

Comments: COLING 2018

arXiv:1805.03398 [pdf, other]

VLSI Architecture of Compact Non-RLL Beacon-based Visible Light Communication Transmitter and Receiver

Authors: Duc-Phuc Nguyen, Dinh-Dung Le, Thi-Hong Tran, Huu-Thuan Huynh, Yasuhiko Nakashima

Abstract: In this paper, we introduce a couple of hardware implementations of compact VLC transmitter and receiver for the first time. Compared with related works, our VLC transmitter is non-RLL one, that means flicker mitigation can be guaranteed even without RLL codes. In particular, we have utilized a centralized bit probability distribution of a prescrambler and a Polar encoder to create a non-RLL flick… ▽ More In this paper, we introduce a couple of hardware implementations of compact VLC transmitter and receiver for the first time. Compared with related works, our VLC transmitter is non-RLL one, that means flicker mitigation can be guaranteed even without RLL codes. In particular, we have utilized a centralized bit probability distribution of a prescrambler and a Polar encoder to create a non-RLL flicker mitigation solution. Moreover, at the receiver, a 3-bit soft-decision filter is proposed to analyze signals received from real VLC channel to extract log-likelihood ratio (LLR) values and feed them to the FEC decoder. Therefore, soft-decoding of Polar decoder can be implemented to improve the bit-error-rate (BER) performance of the VLC system. Finally, we introduce a novel very large scale integration (VLSI) architecture for the compact VLC transmitter and receiver; and synthesis our design under FPGA/ASIC synthesis tools. Due to the non-RLL basic, our system has an evidently good code-rate and a reduced-complexity compared with other RLL-based receiver works. Also, we present FPGA and ASIC synthesis results of the proposed architecture with evaluations of power consumption, area, energy-per-bits and so on. △ Less

Submitted 9 May, 2018; originally announced May 2018.

Comments: Being reviewd by EURASIP Journal of Wireless Communication and Networking

arXiv:1805.00359 [pdf, other]

Hardware Implementation of A Non-RLL Soft-decoding Beacon-based Visible Light Communication Receiver

Authors: Duc-Phuc Nguyen, Dinh-Dung Le, Thi-Hong Tran, Huu-Thuan Huynh, Yasuhiko Nakashima

Abstract: Visible light communication (VLC)-based beacon systems, which usually transmit identification (ID) information in small-size data frames are applied widely in indoor localization applications. There is one fact that flicker of LED light should be avoid in any VLC systems. Current flicker mitigation solutions based on run-length limited (RLL) codes suffer from reduced code rates, or are limited to… ▽ More Visible light communication (VLC)-based beacon systems, which usually transmit identification (ID) information in small-size data frames are applied widely in indoor localization applications. There is one fact that flicker of LED light should be avoid in any VLC systems. Current flicker mitigation solutions based on run-length limited (RLL) codes suffer from reduced code rates, or are limited to hard-decoding forward error correction (FEC) decoders. Recently, soft-decoding techniques of RLL-codes are proposed to support soft-decoding FEC algorithms, but they contain potentials of high-complexity and time-consuming computations. Fortunately, non-RLL direct current (DC)-balance solutions can overcome the drawbacks of RLL-based algorithms, however, they meet some difficulties in system latency or inferior error-correction performances. Recently, non-RLL flicker mitigation solution based on Polar code has proved to be an optimal approach due to its natural equal probabilities of short runs of 1's and 0's with high error-correction performance. However, we found that this solution can only maintain the DC balance only when the data frame length is sufficiently long. Accordingly, short beacon-based data frames might still be a big challenge for flicker mitigation in such non-RLL cases. In this paper, we introduce a flicker mitigation solution designed for VLC-based beacon systems that combines a simple pre-scrambler with a Polar encoder which has a codeword smaller than the previous work 8 times. We also propose a hardware architecture for the proposed compact non-RLL VLC receiver for the first time. Also, a 3-bit soft-decision filter is introduce to enable soft-decoding of Polar decoder to improve the performance of the receiver. △ Less

Submitted 29 May, 2018; v1 submitted 27 April, 2018; originally announced May 2018.

Comments: In review process of ATC'18, HCMC, Vietnam

arXiv:1709.08421 [pdf, other]

Summarization of User-Generated Sports Video by Using Deep Action Recognition Features

Authors: Antonio Tejero-de-Pablos, Yuta Nakashima, Tomokazu Sato, Naokazu Yokoya, Marko Linna, Esa Rahtu

Abstract: Automatically generating a summary of sports video poses the challenge of detecting interesting moments, or highlights, of a game. Traditional sports video summarization methods leverage editing conventions of broadcast sports video that facilitate the extraction of high-level semantics. However, user-generated videos are not edited, and thus traditional methods are not suitable to generate a summ… ▽ More Automatically generating a summary of sports video poses the challenge of detecting interesting moments, or highlights, of a game. Traditional sports video summarization methods leverage editing conventions of broadcast sports video that facilitate the extraction of high-level semantics. However, user-generated videos are not edited, and thus traditional methods are not suitable to generate a summary. In order to solve this problem, this work proposes a novel video summarization method that uses players' actions as a cue to determine the highlights of the original video. A deep neural network-based approach is used to extract two types of action-related features and to classify video segments into interesting or uninteresting parts. The proposed method can be applied to any sports in which games consist of a succession of actions. Especially, this work considers the case of Kendo (Japanese fencing) as an example of a sport to evaluate the proposed method. The method is trained using Kendo videos with ground truth labels that indicate the video highlights. The labels are provided by annotators possessing different experience with respect to Kendo to demonstrate how the proposed method adapts to different needs. The performance of the proposed method is compared with several combinations of different features, and the results show that it outperforms previous summarization methods. △ Less

Submitted 13 April, 2018; v1 submitted 25 September, 2017; originally announced September 2017.

Comments: 12 pages, 8 figures, 4 tables

MSC Class: 68T45

arXiv:1611.08898 [pdf, other]

doi 10.4230/LIPIcs.STACS.2017.45

On the Size of Lempel-Ziv and Lyndon Factorizations

Authors: Juha Kärkkäinen, Dominik Kempa, Yuto Nakashima, Simon J. Puglisi, Arseny M. Shur

Abstract: Lyndon factorization and Lempel-Ziv (LZ) factorization are both important tools for analysing the structure and complexity of strings, but their combinatorial structure is very different. In this paper, we establish the first direct connection between the two by showing that while the Lyndon factorization can be bigger than the non-overlap** LZ factorization (which we demonstrate by describing a… ▽ More Lyndon factorization and Lempel-Ziv (LZ) factorization are both important tools for analysing the structure and complexity of strings, but their combinatorial structure is very different. In this paper, we establish the first direct connection between the two by showing that while the Lyndon factorization can be bigger than the non-overlap** LZ factorization (which we demonstrate by describing a new, non-trivial family of strings) it is never more than twice the size. △ Less

Submitted 27 November, 2016; originally announced November 2016.

Comments: 12 pages

arXiv:1609.08758 [pdf, other]

Video Summarization using Deep Semantic Features

Authors: Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya

Abstract: This paper presents a video summarization technique for an Internet video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original video requires to understand its content. Furthermore the content of Internet videos is very diverse, ranging from home videos to documentaries, which makes video summarization much mor… ▽ More This paper presents a video summarization technique for an Internet video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original video requires to understand its content. Furthermore the content of Internet videos is very diverse, ranging from home videos to documentaries, which makes video summarization much more tough as prior knowledge is almost not available. To tackle this problem, we propose to use deep video features that can encode various levels of content semantics, including objects, actions, and scenes, improving the efficiency of standard video summarization techniques. For this, we design a deep neural network that maps videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of videos and descriptions. To generate a video summary, we extract the deep features from each segment of the original video and apply a clustering-based summarization technique to them. We evaluate our video summaries using the SumMe dataset as well as baseline approaches. The results demonstrated the advantages of incorporating our deep semantic features in a video summarization technique. △ Less

Submitted 27 September, 2016; originally announced September 2016.

Comments: 16 pages, the 13th Asian Conference on Computer Vision (ACCV'16)

Showing 51–100 of 122 results for author: Nakashima, Y