Skip to main content

Showing 1–10 of 10 results for author: Atienza, R

Searching in archive cs. Search in all archives.
.
  1. Scene Text Recognition Models Explainability Using Local Features

    Authors: Mark Vincent Ty, Rowel Atienza

    Abstract: Explainable AI (XAI) is the study on how humans can be able to understand the cause of a model's prediction. In this work, the problem of interest is Scene Text Recognition (STR) Explainability, using XAI to understand the cause of an STR model's prediction. Recent XAI literatures on STR only provide a simple analysis and do not fully explore other XAI methods. In this study, we specifically work… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    Comments: T2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023

  2. arXiv:2305.13905  [pdf, other

    eess.AS cs.CL cs.SD

    EfficientSpeech: An On-Device Text to Speech Model

    Authors: Rowel Atienza

    Abstract: State of the art (SOTA) neural text to speech (TTS) models can generate natural-sounding synthetic voices. These models are characterized by large memory footprints and substantial number of operations due to the long-standing focus on speech quality with cloud inference in mind. Neural TTS models are generally not designed to perform standalone speech syntheses on resource-constrained and no Inte… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: To be presented at ICASSP 2023

  3. arXiv:2207.06966  [pdf, other

    cs.CV cs.CL

    Scene Text Recognition with Permuted Autoregressive Sequence Models

    Authors: Darwin Bautista, Rowel Atienza

    Abstract: Context-aware STR methods typically use internal autoregressive (AR) language models (LM). Inherent limitations of AR models motivated two-stage methods which employ an external LM. The conditional independence of the external LM on the input image may cause it to erroneously rectify correct predictions, leading to significant inefficiencies. Our method, PARSeq, learns an ensemble of internal AR L… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: Accepted at the 17th European Conference on Computer Vision (ECCV 2022)

  4. arXiv:2204.10546  [pdf, other

    cs.LG cs.CV

    Depth Pruning with Auxiliary Networks for TinyML

    Authors: Josen Daniel De Leon, Rowel Atienza

    Abstract: Pruning is a neural network optimization technique that sacrifices accuracy in exchange for lower computational requirements. Pruning has been useful when working with extremely constrained environments in tinyML. Unfortunately, special hardware requirements and limited study on its effectiveness on already compact models prevent its wider adoption. Depth pruning is a form of pruning that requires… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    Comments: To be published in International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2022

  5. arXiv:2110.10536  [pdf, other

    cs.CV

    Improving Model Generalization by Agreement of Learned Representations from Data Augmentation

    Authors: Rowel Atienza

    Abstract: Data augmentation reduces the generalization error by forcing a model to learn invariant representations given different transformations of the input image. In computer vision, on top of the standard image processing functions, data augmentation techniques based on regional dropout such as CutOut, MixUp, and CutMix and policy-based selection such as AutoAugment demonstrated state-of-the-art (SOTA)… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

    Comments: Accepted at WACV2022

  6. arXiv:2108.06949  [pdf, other

    cs.CV

    Data Augmentation for Scene Text Recognition

    Authors: Rowel Atienza

    Abstract: Scene text recognition (STR) is a challenging task in computer vision due to the large number of possible text appearances in natural scenes. Most STR models rely on synthetic datasets for training since there are no sufficiently big and publicly available labelled real datasets. Since STR models are evaluated using real data, the mismatch between training and testing data distributions results in… ▽ More

    Submitted 16 August, 2021; originally announced August 2021.

    Comments: Interactive Labeling and Data Augmentation for Vision ICCV 2021 Workshop

  7. arXiv:2105.10793  [pdf, other

    cs.CV

    GOO: A Dataset for Gaze Object Prediction in Retail Environments

    Authors: Henri Tomas, Marcus Reyes, Raimarc Dionido, Mark Ty, Jonric Mirando, Joel Casimiro, Rowel Atienza, Richard Guinto

    Abstract: One of the most fundamental and information-laden actions humans do is to look at objects. However, a survey of current works reveals that existing gaze-related datasets annotate only the pixel being looked at, and not the boundaries of a specific object of interest. This lack of object annotation presents an opportunity for further advancing gaze estimation research. To this end, we present a cha… ▽ More

    Submitted 21 June, 2021; v1 submitted 22 May, 2021; originally announced May 2021.

    Comments: CVPR 20201 Workshop on Gaze Estimation and Prediction in the Wild (GAZE 2021)

  8. arXiv:2105.08582  [pdf, other

    cs.CV

    Vision Transformer for Fast and Efficient Scene Text Recognition

    Authors: Rowel Atienza

    Abstract: Scene text recognition (STR) enables computers to read text in natural scenes such as object labels, road signs and instructions. STR helps machines perform informed decisions such as what object to pick, which direction to go, and what is the next step of action. In the body of work on STR, the focus has always been on recognition accuracy. There is little emphasis placed on speed and computation… ▽ More

    Submitted 18 May, 2021; originally announced May 2021.

    Comments: To appear at ICDAR2021 Springer Lecture Notes in Computer Science series

    Journal ref: ICDAR2021

  9. arXiv:2008.12664  [pdf, other

    cs.CV

    Next-Best View Policy for 3D Reconstruction

    Authors: Daryl Peralta, Joel Casimiro, Aldrin Michael Nilles, Justine Aletta Aguilar, Rowel Atienza, Rhandley Cajote

    Abstract: Manually selecting viewpoints or using commonly available flight planners like circular path for large-scale 3D reconstruction using drones often results in incomplete 3D models. Recent works have relied on hand-engineered heuristics such as information gain to select the Next-Best Views. In this work, we present a learning-based algorithm called Scan-RL to learn a Next-Best View (NBV) Policy. To… ▽ More

    Submitted 6 September, 2020; v1 submitted 28 August, 2020; originally announced August 2020.

    Comments: To be published in ECCV 2020 Workshops; typos in abstract corrected

  10. arXiv:1805.07499  [pdf, other

    cs.CV cs.RO

    Fast Disparity Estimation using Dense Networks

    Authors: Rowel Atienza

    Abstract: Disparity estimation is a difficult problem in stereo vision because the correspondence technique fails in images with textureless and repetitive regions. Recent body of work using deep convolutional neural networks (CNN) overcomes this problem with semantics. Most CNN implementations use an autoencoder method; stereo images are encoded, merged and finally decoded to predict the disparity map. In… ▽ More

    Submitted 18 May, 2018; originally announced May 2018.

    Comments: In Proc. International Conference on Robotics and Automation 2018 (ICRA2018)