Search | arXiv e-print repository

doi 10.1109/ACCESS.2024.3404834

MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition

Authors: Felix M. Schmitt-Koopmann, Elaine M. Huang, Hans-Peter Hutter, Thilo Stadelmann, Alireza Darvishy

Abstract: Printed mathematical expression recognition (MER) models are usually trained and tested using LaTeX-generated mathematical expressions (MEs) as input and the LaTeX source code as ground truth. As the same ME can be generated by various different LaTeX source codes, this leads to unwanted variations in the ground truth data that bias test performance results and hinder efficient learning. In additi… ▽ More Printed mathematical expression recognition (MER) models are usually trained and tested using LaTeX-generated mathematical expressions (MEs) as input and the LaTeX source code as ground truth. As the same ME can be generated by various different LaTeX source codes, this leads to unwanted variations in the ground truth data that bias test performance results and hinder efficient learning. In addition, the use of only one font to generate the MEs heavily limits the generalization of the reported results to realistic scenarios. We propose a data-centric approach to overcome this problem, and present convincing experimental results: Our main contribution is an enhanced LaTeX normalization to map any LaTeX ME to a canonical form. Based on this process, we developed an improved version of the benchmark dataset im2latex-100k, featuring 30 fonts instead of one. Second, we introduce the real-world dataset realFormula, with MEs extracted from papers. Third, we developed a MER model, MathNet, based on a convolutional vision transformer, with superior results on all four test sets (im2latex-100k, im2latexv2, realFormula, and InftyMDB-1), outperforming the previous state of the art by up to 88.3%. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 12 pages, 6 figures

Journal ref: IEEE Access 12 (2024) 76963-76974

arXiv:2305.14041

The state of scientific PDF accessibility in repositories: A survey in Switzerland

Authors: Alireza Darvishy, Rolf Sethe, Ines Engler, Oriane Pierres, Juliet Manning

Abstract: This survey analyzed the quality of the PDF documents on online repositories in Switzerland, examining their accessibility for people with visual impairments. Two minimal accessibility features were analyzed: the PDFs had to have tags and a hierarchical heading structure. The survey also included interviews with the managers or heads of multiple Swiss universities' repositories to assess the gener… ▽ More This survey analyzed the quality of the PDF documents on online repositories in Switzerland, examining their accessibility for people with visual impairments. Two minimal accessibility features were analyzed: the PDFs had to have tags and a hierarchical heading structure. The survey also included interviews with the managers or heads of multiple Swiss universities' repositories to assess the general opinion and knowledge of PDF accessibility. An analysis of interviewee responses indicates an overall lack of awareness of PDF accessibility, and showed that online repositories currently have no concrete plans to address the issue. This paper concludes by presenting a set of recommendations for online repositories to improve the accessibility of their PDF documents. △ Less

Submitted 14 June, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: We need to modify this paper and make some extensions before re-uploading

arXiv:2301.02546 [pdf]

A new conversational interaction concept for document creation and editing on mobile devices for visually impaired users

Authors: Alireza Darvishy, Hans-Peter Hutter, Edin Beljulji, Zeno Heeb

Abstract: This paper describes the ongoing development of a conversational interaction concept that allows visually impaired users to easily create and edit text documents on mobile devices using mainly voice input. In order to verify the concept, a prototype app was developed and tested for both iOS and Android systems, based on the natural-language understanding (NLU) platform Google Dialogflow. The app a… ▽ More This paper describes the ongoing development of a conversational interaction concept that allows visually impaired users to easily create and edit text documents on mobile devices using mainly voice input. In order to verify the concept, a prototype app was developed and tested for both iOS and Android systems, based on the natural-language understanding (NLU) platform Google Dialogflow. The app and interaction concept were repeatedly tested by users with and without visual impairments. Based on their feedback, the concept was continuously refined, adapted and improved on both mobile platforms. In an iterative user-centred design approach, the following research questions were investigated: Can a visually impaired user rely mainly on speech commands to efficiently create and edit a document on mobile devices? User testing found that an interaction concept based on conversational speech commands was easy and intuitive for visually impaired users. However, it was also found that relying on speech commands alone created its own obstacles, and that a combination of gestures and voice interaction would be more robust. Future research and more extensive useability tests should be carried out among visually impaired users in order to optimize the interaction concept. △ Less

Submitted 6 January, 2023; originally announced January 2023.

arXiv:2212.04745 [pdf, other]

SLAM for Visually Impaired People: a Survey

Authors: Marziyeh Bamdad, Davide Scaramuzza, Alireza Darvishy

Abstract: In recent decades, several assistive technologies have been developed to improve the ability of blind and visually impaired individuals to navigate independently and safely. At the same time, simultaneous localization and map** (SLAM) techniques have become sufficiently robust and efficient to be adopted in develo** these assistive technologies. We present the first systematic literature revie… ▽ More In recent decades, several assistive technologies have been developed to improve the ability of blind and visually impaired individuals to navigate independently and safely. At the same time, simultaneous localization and map** (SLAM) techniques have become sufficiently robust and efficient to be adopted in develo** these assistive technologies. We present the first systematic literature review of 54 recent studies on SLAM-based solutions for blind and visually impaired people, focusing on literature published from 2017 onward. This review explores various localization and map** techniques employed in this context. We systematically identified and categorized diverse SLAM approaches and analyzed their localization and map** techniques, sensor types, computing resources, and machine-learning methods. We discuss the advantages and limitations of these techniques for blind and visually impaired navigation. Moreover, we examine the major challenges described across studies, including practical considerations that affect usability and adoption. Our analysis also evaluates the effectiveness of these SLAM-based solutions in real-world scenarios and user satisfaction, providing insights into their practical impact on BVI mobility. The insights derived from this review identify critical gaps and opportunities for future research activities, particularly in addressing the challenges presented by dynamic and complex environments. We explain how SLAM technology offers the potential to improve the ability of visually impaired individuals to navigate effectively. Finally, we present future opportunities and challenges in this domain. △ Less

Submitted 24 May, 2024; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: 45 pages, 38 tables, 6 figures

Showing 1–4 of 4 results for author: Darvishy, A