Skip to main content

Showing 1–27 of 27 results for author: Singh, M K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08816  [pdf, other

    cs.CV

    ToSA: Token Selective Attention for Efficient Vision Transformers

    Authors: Manish Kumar Singh, Rajeev Yasarla, Hong Cai, Mingu Lee, Fatih Porikli

    Abstract: In this paper, we propose a novel token selective attention approach, ToSA, which can identify tokens that need to be attended as well as those that can skip a transformer layer. More specifically, a token selector parses the current attention maps and predicts the attention maps for the next layer, which are then used to select the important tokens that should participate in the attention operati… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted at CVPRW 2024

  2. arXiv:2406.05505  [pdf, other

    cs.IR cs.AI

    I-SIRch: AI-Powered Concept Annotation Tool For Equitable Extraction And Analysis Of Safety Insights From Maternity Investigations

    Authors: Mohit Kumar Singh, Georgina Cosma, Patrick Waterson, Jonathan Back, Gyuchan Thomas Jun

    Abstract: Maternity care is a complex system involving treatments and interactions between patients, providers, and the care environment. To improve patient safety and outcomes, understanding the human factors (e.g. individuals decisions, local facilities) influencing healthcare delivery is crucial. However, most current tools for analysing healthcare data focus only on biomedical concepts (e.g. health cond… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  3. arXiv:2406.03822  [pdf, other

    cs.SD cs.CR eess.AS

    SilentCipher: Deep Audio Watermarking

    Authors: Mayank Kumar Singh, Naoya Takahashi, Weihsiang Liao, Yuki Mitsufuji

    Abstract: In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional methods, the encoded messages introduce audible artefacts that restricts their usage in professional settings. In this study… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  4. arXiv:2403.12953  [pdf, other

    cs.CV

    FutureDepth: Learning to Predict the Future Improves Video Depth Estimation

    Authors: Rajeev Yasarla, Manish Kumar Singh, Hong Cai, Yunxiao Shi, Jisoo Jeong, Yinhao Zhu, Shizhong Han, Risheek Garrepalli, Fatih Porikli

    Abstract: In this paper, we propose a novel video depth estimation approach, FutureDepth, which enables the model to implicitly leverage multi-frame and motion cues to improve depth estimation by making it learn to predict the future at training. More specifically, we propose a future prediction network, F-Net, which takes the features of multiple consecutive frames and is trained to predict multi-frame fea… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  5. arXiv:2403.12202  [pdf, other

    cs.CV

    DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions

    Authors: Yunxiao Shi, Manish Kumar Singh, Hong Cai, Fatih Porikli

    Abstract: In this paper, we introduce a novel approach that harnesses both 2D and 3D attentions to enable highly accurate depth completion without requiring iterative spatial propagations. Specifically, we first enhance a baseline convolutional depth completion model by applying attention to 2D features in the bottleneck and skip connections. This effectively improves the performance of this simple network… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR 2024

  6. arXiv:2311.10794  [pdf, other

    cs.CV

    Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

    Authors: Animesh Sinha, Bo Sun, Anmol Kalia, Arantxa Casanova, Elliot Blanchard, David Yan, Winnie Zhang, Tony Nelli, Jiahui Chen, Hardik Shah, Licheng Yu, Mitesh Kumar Singh, Ankit Ramchandani, Maziar Sanjabi, Sonal Gupta, Amy Bearman, Dhruv Mahajan

    Abstract: We introduce Style Tailoring, a recipe to finetune Latent Diffusion Models (LDMs) in a distinct domain with high visual quality, prompt alignment and scene diversity. We choose sticker image generation as the target domain, as the images significantly differ from photorealistic samples typically generated by large-scale LDMs. We start with a competent text-to-image model, like Emu, and show that r… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures

  7. arXiv:2309.15807  [pdf, other

    cs.CV

    Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

    Authors: Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda , et al. (1 additional authors not shown)

    Abstract: Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. However, these pre-trained models often face challenges when it comes to generating highly aesthetic images. This creates the need for aesthetic alignment post pre-training. In this paper, we propose quality-tuning to effectively guide a pre-trained model to exclusivel… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  8. arXiv:2305.15055  [pdf, other

    cs.SD cs.AI eess.AS

    Iteratively Improving Speech Recognition and Voice Conversion

    Authors: Mayank Kumar Singh, Naoya Takahashi, Onoe Naoyuki

    Abstract: Many existing works on voice conversion (VC) tasks use automatic speech recognition (ASR) models for ensuring linguistic consistency between source and converted samples. However, for the low-data resource domains, training a high-quality ASR remains to be a challenging task. In this work, we propose a novel iterative way of improving both the ASR and VC models. We first train an ASR model which i… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  9. arXiv:2302.13838  [pdf, other

    cs.CV cs.SD eess.AS

    Cross-modal Face- and Voice-style Transfer

    Authors: Naoya Takahashi, Mayank K. Singh, Yuki Mitsufuji

    Abstract: Image-to-image translation and voice conversion enable the generation of a new facial image and voice while maintaining some of the semantics such as a pose in an image and linguistic content in audio, respectively. They can aid in the content-creation process in many applications. However, as they are limited to the conversion within each modality, matching the impression of the generated face an… ▽ More

    Submitted 1 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  10. arXiv:2302.10536  [pdf, other

    cs.SD cs.AI eess.AS

    Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing

    Authors: Nirmesh Shah, Mayank Kumar Singh, Naoya Takahashi, Naoyuki Onoe

    Abstract: Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a given speech signal from one style to another style without modifying the linguistic content of the signal. Most of the state-of-the-art approaches convert emotions for seen speaker-emotion combinations only. In this paper, we tackle the problem of converting the emotion of speakers whose only neutral data ar… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: Demo Samples at https://demosamplesites.github.io/EVCUP/

  11. arXiv:2301.09776  [pdf, ps, other

    eess.IV cs.IT cs.LG cs.MM

    Differentiable bit-rate estimation for neural-based video codec enhancement

    Authors: Amir Said, Manish Kumar Singh, Reza Pourreza

    Abstract: Neural networks (NN) can improve standard video compression by pre- and post-processing the encoded video. For optimal NN training, the standard codec needs to be replaced with a codec proxy that can provide derivatives of estimated bit-rate and distortion, which are used for gradient back-propagation. Since entropy coding of standard codecs is designed to take into account non-linear dependencies… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

    Journal ref: Picture Coding Symposium (PCS), San Jose, CA, USA, 2022, pp. 379-383

  12. arXiv:2301.01380  [pdf, other

    cs.CV

    Ego-Only: Egocentric Action Detection without Exocentric Transferring

    Authors: Huiyu Wang, Mitesh Kumar Singh, Lorenzo Torresani

    Abstract: We present Ego-Only, the first approach that enables state-of-the-art action detection on egocentric (first-person) videos without any form of exocentric (third-person) transferring. Despite the content and appearance gap separating the two domains, large-scale exocentric transferring has been the default choice for egocentric action detection. This is because prior works found that egocentric mod… ▽ More

    Submitted 19 May, 2023; v1 submitted 3 January, 2023; originally announced January 2023.

  13. arXiv:2210.11096  [pdf, other

    cs.SD cs.LG eess.AS

    Robust One-Shot Singing Voice Conversion

    Authors: Naoya Takahashi, Mayank Kumar Singh, Yuki Mitsufuji

    Abstract: Recent progress in deep generative models has improved the quality of voice conversion in the speech domain. However, high-quality singing voice conversion (SVC) of unseen singers remains challenging due to the wider variety of musical expressions in pitch, loudness, and pronunciation. Moreover, singing voices are often recorded with reverb and accompaniment music, which make SVC even more challen… ▽ More

    Submitted 6 October, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

  14. arXiv:2203.11556  [pdf, other

    cs.LG cs.AI stat.ML

    VQ-Flows: Vector Quantized Local Normalizing Flows

    Authors: Sahil Sidheekh, Chris B. Dock, Tushar Jain, Radu Balan, Maneesh K. Singh

    Abstract: Normalizing flows provide an elegant approach to generative modeling that allows for efficient sampling and exact density evaluation of unknown data distributions. However, current techniques have significant limitations in their expressivity when the data distribution is supported on a low-dimensional manifold or has a non-trivial topology. We introduce a novel statistical framework for learning… ▽ More

    Submitted 18 June, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: Accepted to The 38th Conference on Uncertainty in Artificial Intelligence (UAI) 2022

  15. arXiv:2110.05054  [pdf, other

    cs.SD cs.CR eess.AS

    Source Mixing and Separation Robust Audio Steganography

    Authors: Naoya Takahashi, Mayank Kumar Singh, Yuki Mitsufuji

    Abstract: Audio steganography aims at concealing secret information in carrier audio with imperceptible modification on the carrier. Although previous works addressed the robustness of concealed message recovery against distortions introduced during transmission, they do not address the robustness against aggressive editing such as mixing of other audio sources and source separation. In this work, we propos… ▽ More

    Submitted 17 February, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP 2022

  16. arXiv:2101.06842  [pdf, other

    cs.SD cs.LG eess.AS

    Hierarchical disentangled representation learning for singing voice conversion

    Authors: Naoya Takahashi, Mayank Kumar Singh, Yuki Mitsufuji

    Abstract: Conventional singing voice conversion (SVC) methods often suffer from operating in high-resolution audio owing to a high dimensionality of data. In this paper, we propose a hierarchical representation learning that enables the learning of disentangled representations with multiple resolutions independently. With the learned disentangled representations, the proposed method progressively performs S… ▽ More

    Submitted 25 April, 2021; v1 submitted 17 January, 2021; originally announced January 2021.

    Comments: accepted at IJCNN 2021

  17. arXiv:2010.15390  [pdf, other

    cs.LG stat.ML

    Multitask Bandit Learning Through Heterogeneous Feedback Aggregation

    Authors: Zhi Wang, Chicheng Zhang, Manish Kumar Singh, Laurel D. Riek, Kamalika Chaudhuri

    Abstract: In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol. We formulate this problem as the $ε$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms, and for each arm, the reward distributions for all players are similar but not necessarily id… ▽ More

    Submitted 19 July, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

    Journal ref: In International Conference on Artificial Intelligence and Statistics (pp. 1531-1539). PMLR (2021, March)

  18. arXiv:2007.13524  [pdf, other

    cs.LG stat.ML

    Dynamic Relational Inference in Multi-Agent Trajectories

    Authors: Ruichao Xiao, Manish Kumar Singh, Rose Yu

    Abstract: Inferring interactions from multi-agent trajectories has broad applications in physics, vision and robotics. Neural relational inference (NRI) is a deep generative model that can reason about relations in complex dynamics without supervision. In this paper, we take a careful look at this approach for relational inference in multi-agent trajectories. First, we discover that NRI can be fundamentally… ▽ More

    Submitted 8 October, 2020; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: submitted to ICLR 2021

  19. arXiv:2005.12147  [pdf, other

    cs.LG cs.CL cs.CV

    NENET: An Edge Learnable Network for Link Prediction in Scene Text

    Authors: Mayank Kumar Singh, Sayan Banerjee, Shubhasis Chaudhuri

    Abstract: Text detection in scenes based on deep neural networks have shown promising results. Instead of using word bounding box regression, recent state-of-the-art methods have started focusing on character bounding box and pixel-level prediction. This necessitates the need to link adjacent characters, which we propose in this paper using a novel Graph Neural Network (GNN) architecture that allows us to l… ▽ More

    Submitted 25 May, 2020; originally announced May 2020.

    Comments: 9 pages

  20. arXiv:2004.13945  [pdf, other

    cs.CL

    Linguistic Resources for Bhojpuri, Magahi and Maithili: Statistics about them, their Similarity Estimates, and Baselines for Three Applications

    Authors: Rajesh Kumar Mundotiya, Manish Kumar Singh, Rahul Kapur, Swasti Mishra, Anil Kumar Singh

    Abstract: Corpus preparation for low-resource languages and for development of human language technology to analyze or computationally process them is a laborious task, primarily due to the unavailability of expert linguists who are native speakers of these languages and also due to the time and resources required. Bhojpuri, Magahi, and Maithili, languages of the Purvanchal region of India (in the north-eas… ▽ More

    Submitted 17 August, 2021; v1 submitted 28 April, 2020; originally announced April 2020.

    Comments: ACM Transactions on Asian and Low-Resource Language Information Processing (Accepted)

  21. arXiv:1911.12928  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Voice Separation by Incorporating End-to-end Speech Recognition

    Authors: Naoya Takahashi, Mayank Kumar Singh, Sakya Basak, Parthasaarathy Sudarsanam, Sriram Ganapathy, Yuki Mitsufuji

    Abstract: Despite recent advances in voice separation methods, many challenges remain in realistic scenarios such as noisy recording and the limits of available data. In this work, we propose to explicitly incorporate the phonetic and linguistic nature of speech by taking a transfer learning approach using an end-to-end automatic speech recognition (E2EASR) system. The voice separation is conditioned on dee… ▽ More

    Submitted 3 May, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

    Comments: Accepted in ICASSP 2020

  22. arXiv:1808.00948  [pdf, other

    cs.CV

    Diverse Image-to-Image Translation via Disentangled Representations

    Authors: Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Kumar Singh, Ming-Hsuan Yang

    Abstract: Image-to-image translation aims to learn the map** between two visual domains. There are two main challenges for many applications: 1) the lack of aligned training pairs and 2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for producing diverse outputs without paired training images. To achieve diversity, we propose… ▽ More

    Submitted 2 August, 2018; originally announced August 2018.

    Comments: ECCV 2018 (Oral). Project page: http://vllab.ucmerced.edu/hylee/DRIT/ Code: https://github.com/HsinYingLee/DRIT/

  23. The Forecasting of 3G Market in India Based on Revised Technology Acceptance Model

    Authors: Sudha Singh, D. K. Singh, M. K. Singh, Sujeet Kumar Singh

    Abstract: 3G, processor of 2G services, is a family of standards for mobile telecommunications defined by the International Telecommunication Union [1]. 3G services include wide-area wireless voice telephone, video calls, and wireless data, all in a mobile environment. It allows simultaneous use of speech and data services and higher data rates.3G is defined to facilitate growth, increased bandwidth and sup… ▽ More

    Submitted 18 June, 2010; originally announced June 2010.

    Comments: 8 Pages

    Journal ref: International Journal of Next-Generation Networks 2.2 (2010) 61-68

  24. arXiv:1004.1708  [pdf

    cs.SE

    Mathematical Principles in Software Quality Engineering

    Authors: Manoranjan Kumar Singh, Rakesh. L

    Abstract: Mathematics has many useful properties for develo** of complex software systems. One is that it can exactly describe a physical situation of the object or outcome of an action. Mathematics support abstraction and this is an excellent medium for modeling, since it is an exact medium there is a little possibility of ambiguity. This paper demonstrates that mathematics provides a high level of valid… ▽ More

    Submitted 10 April, 2010; originally announced April 2010.

    Comments: IEEE Publication format, ISSN 1947 5500, http://sites.google.com/site/ijcsis/

    Journal ref: IJCSIS, Vol. 7 No. 3, March 2010, 178-184

  25. Node Isolation Probability of Wireless Adhoc Networks in Nagakami Fading Channel

    Authors: A. V. Babu, Mukesh Kumar Singh

    Abstract: This paper investigates the issue of connectivity of a wireless adhoc network in the presence of channel impairments. We derive analytical expressions for the node isolation probability in an adhoc network in the presence of Nakagami-m fading with superimposed lognormal shadowing. The node isolation probability is the probability that a randomly chosen node is not able to communicate with none of… ▽ More

    Submitted 16 March, 2010; originally announced March 2010.

    Comments: 16 pages, IJCNC Journal

    Journal ref: International Journal of Computer Networks & Communications 2.2 (2010) 21-36

  26. arXiv:1002.4004  [pdf

    cs.NE

    Nature inspired artificial intelligence based adaptive traffic flow distribution in computer network

    Authors: Manoj Kumar Singh

    Abstract: Because of the stochastic nature of traffic requirement matrix, it is very difficult to get the optimal traffic distribution to minimize the delay even with adaptive routing protocol in a fixed connection network where capacity already defined for each link. Hence there is a requirement to define such a method, which could generate the optimal solution very quickly and efficiently. This paper pr… ▽ More

    Submitted 21 February, 2010; originally announced February 2010.

    Journal ref: Journal of Computing, Volume 2, Issue 2, February 2010, https://sites.google.com/site/journalofcomputing/

  27. arXiv:0910.1838  [pdf

    cs.CR cs.NE

    Password Based a Generalize Robust Security System Design Using Neural Network

    Authors: Manoj Kumar Singh

    Abstract: Among the various means of available resource protection including biometrics, password based system is most simple, user friendly, cost effective and commonly used. But this method having high sensitivity with attacks. Most of the advanced methods for authentication based on password encrypt the contents of password before storing or transmitting in physical domain. But all conventional cryptog… ▽ More

    Submitted 9 October, 2009; originally announced October 2009.

    Comments: International Journal of Computer Science Issues, IJCSI, Volume 4, Issue 2, pp1-9, September 2009

    Journal ref: M.K Singh, "Password Based A Generalize Robust Security System Design Using Neural Network", International Journal of Computer Science Issues, IJCSI, Volume 4, Issue 2, pp1-9, September 2009