Skip to main content

Showing 1–11 of 11 results for author: Jia, F

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.09931  [pdf, other

    eess.IV cs.CV cs.LG

    SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms

    Authors: Yifei Chen, Zhu Zhu, Shenghao Zhu, Linwei Qiu, Binfeng Zou, Fan Jia, Yunpeng Zhu, Chenyan Zhang, Zhaojie Fang, Feiwei Qin, ** Fan, Changmiao Wang, Yu Gao, Gang Yu

    Abstract: The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 15 pages, 6 figures

  2. arXiv:2404.04295  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

    Authors: Hainan Xu, Zhehuai Chen, Fei Jia, Boris Ginsburg

    Abstract: This paper proposes Transducers with Pronunciation-aware Embeddings (PET). Unlike conventional Transducers where the decoder embeddings for different tokens are trained independently, the PET model's decoder embedding incorporates shared components for text tokens with the same or similar pronunciations. With experiments conducted in multiple datasets in Mandarin Chinese and Korean, we show that P… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: accepted at the ICASSP 2024 conference

  3. arXiv:2304.06795  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

    Authors: Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg

    Abstract: This paper introduces a novel Token-and-Duration Transducer (TDT) architecture for sequence-to-sequence tasks. TDT extends conventional RNN-Transducer architectures by jointly predicting both a token and its duration, i.e. the number of input frames covered by the emitted token. This is achieved by using a joint network with two outputs which are independently normalized to generate distributions… ▽ More

    Submitted 29 May, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

  4. arXiv:2211.05103  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models

    Authors: Travis M. Bartley, Fei Jia, Krishna C. Puvvada, Samuel Kriman, Boris Ginsburg

    Abstract: In this paper, we extend previous self-supervised approaches for language identification by experimenting with Conformer based architecture in a multilingual pre-training paradigm. We find that pre-trained speech models optimally encode language discriminatory information in lower layers. Further, we demonstrate that the embeddings obtained from these layers are significantly robust to classify un… ▽ More

    Submitted 13 March, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  5. arXiv:2211.03541  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-blank Transducers for Speech Recognition

    Authors: Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg

    Abstract: This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition (ASR). In standard RNN-T, the emission of a blank symbol consumes exactly one input frame; in our proposed method, we introduce additional blank symbols, which consume two or more input frames when emitted. We refer to the added symbols as big blanks, and the method multi-blank RNN-T. For training… ▽ More

    Submitted 11 April, 2024; v1 submitted 4 November, 2022; originally announced November 2022.

    Journal ref: ICASSP 2023

  6. arXiv:2210.15781  [pdf, other

    eess.AS cs.CL cs.SD

    A Compact End-to-End Model with Local and Global Context for Spoken Language Identification

    Authors: Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

    Abstract: We introduce TitaNet-LID, a compact end-to-end neural network for Spoken Language Identification (LID) that is based on the ContextNet architecture. TitaNet-LID employs 1D depth-wise separable convolutions and Squeeze-and-Excitation layers to effectively capture local and global context within an utterance. Despite its small size, TitaNet-LID achieves performance similar to state-of-the-art models… ▽ More

    Submitted 10 August, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Accepted to INTERSPEECH 2023

  7. arXiv:2110.10965  [pdf, other

    eess.IV cs.CV

    2020 CATARACTS Semantic Segmentation Challenge

    Authors: Imanol Luengo, Maria Grammatikopoulou, Rahim Mohammadi, Chris Walsh, Chinedu Innocent Nwoye, Deepak Alapatt, Nicolas Padoy, Zhen-Liang Ni, Chen-Chen Fan, Gui-Bin Bian, Zeng-Guang Hou, Heon** Ha, Jiacheng Wang, Haojie Wang, Dong Guo, Lu Wang, Guotai Wang, Mobarakol Islam, Bharat Giddwani, Ren Hongliang, Theodoros Pissas, Claudio Ravasio, Martin Huber, Jeremy Birch, Joan M. Nunez Do Rio , et al. (15 additional authors not shown)

    Abstract: Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presenc… ▽ More

    Submitted 24 February, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

  8. arXiv:2109.14956  [pdf

    eess.IV cs.CV cs.LG

    Comparative Validation of Machine Learning Algorithms for Surgical Workflow and Skill Analysis with the HeiChole Benchmark

    Authors: Martin Wagner, Beat-Peter Müller-Stich, Anna Kisilenko, Duc Tran, Patrick Heger, Lars Mündermann, David M Lubotsky, Benjamin Müller, Tornike Davitashvili, Manuela Capek, Annika Reinke, Tong Yu, Armine Vardazaryan, Chinedu Innocent Nwoye, Nicolas Padoy, Xinyang Liu, Eung-Joo Lee, Constantin Disch, Hans Meine, Tong Xia, Fucang Jia, Satoshi Kondo, Wolfgang Reiter, Yueming **, Yonghao Long , et al. (16 additional authors not shown)

    Abstract: PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported fo… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

  9. arXiv:2106.05735  [pdf, other

    eess.IV cs.CV cs.LG

    The Medical Segmentation Decathlon

    Authors: Michela Antonelli, Annika Reinke, Spyridon Bakas, Keyvan Farahani, AnnetteKopp-Schneider, Bennett A. Landman, Geert Litjens, Bjoern Menze, Olaf Ronneberger, Ronald M. Summers, Bram van Ginneken, Michel Bilello, Patrick Bilic, Patrick F. Christ, Richard K. G. Do, Marc J. Gollub, Stephan H. Heckers, Henkjan Huisman, William R. Jarnagin, Maureen K. McHugo, Sandy Napel, Jennifer S. Goli Pernicka, Kawal Rhode, Catalina Tobon-Gomez, Eugene Vorontsov , et al. (34 additional authors not shown)

    Abstract: International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical pro… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    MSC Class: 68T07

  10. Improving Physical Layer Security for Reconfigurable Intelligent Surface aided NOMA 6G Networks

    Authors: Zhe Zhang, Chensi Zhang, Chengjun Jiang, Fan Jia, Jianhua Ge, Fengkui Gong

    Abstract: The intrinsic integration of the nonorthogonal multiple access (NOMA) and reconfigurable intelligent surface (RIS) techniques is envisioned to be a promising approach to significantly improve both the spectrum efficiency and energy efficiency for future wireless communication networks. In this paper, the physical layer security (PLS) for a RIS-aided NOMA 6G networks is investigated, in which a RIS… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

  11. arXiv:2010.13886  [pdf, other

    eess.AS cs.SD

    MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection

    Authors: Fei Jia, Somshubra Majumdar, Boris Ginsburg

    Abstract: We present MarbleNet, an end-to-end neural network for Voice Activity Detection (VAD). MarbleNet is a deep residual network composed from blocks of 1D time-channel separable convolution, batch-normalization, ReLU and dropout layers. When compared to a state-of-the-art VAD model, MarbleNet is able to achieve similar performance with roughly 1/10-th the parameter cost. We further conduct extensive a… ▽ More

    Submitted 11 February, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: Accepted to ICASSP 2021