Skip to main content

Showing 1–18 of 18 results for author: Bose, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.09036  [pdf, other

    cs.CV

    Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?

    Authors: Tiantian Feng, Daniel Yang, Digbalay Bose, Shrikanth Narayanan

    Abstract: Multi-modal learning has emerged as an increasingly promising avenue in vision recognition, driving innovations across diverse domains ranging from media and education to healthcare and transportation. Despite its success, the robustness of multi-modal learning for visual recognition is often challenged by the unavailability of a subset of modalities, especially the visual modality. Conventional a… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  2. arXiv:2312.03146  [pdf, other

    cs.AR

    LRMP: Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators

    Authors: Abinand Nallathambi, Christin David Bose, Wilfried Haensch, Anand Raghunathan

    Abstract: In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Map** DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times an… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  3. arXiv:2309.09405  [pdf, other

    cs.AI cs.CL cs.CV

    Does Video Summarization Require Videos? Quantifying the Effectiveness of Language in Video Summarization

    Authors: Yoonsoo Nam, Adam Lehavi, Daniel Yang, Digbalay Bose, Swabha Swayamdipta, Shrikanth Narayanan

    Abstract: Video summarization remains a huge challenge in computer vision due to the size of the input videos to be summarized. We propose an efficient, language-only video summarizer that achieves competitive accuracy with high data efficiency. Using only textual captions obtained via a zero-shot approach, we train a language transformer model and forego image representations. This method allows us to perf… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  4. arXiv:2308.14052  [pdf, other

    cs.CV

    MM-AU:Towards Multimodal Understanding of Advertisement Videos

    Authors: Digbalay Bose, Rajat Hebbar, Tiantian Feng, Krishna Somandepalli, Anfeng Xu, Shrikanth Narayanan

    Abstract: Advertisement videos (ads) play an integral part in the domain of Internet e-commerce as they amplify the reach of particular products to a broad audience or can serve as a medium to raise awareness about specific issues through concise narrative structures. The narrative structures of advertisements involve several elements like reasoning about the broad content (topic and the underlying message)… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: Accepted to ACM Multimedia 2023

  5. arXiv:2306.09486  [pdf, other

    cs.DC cs.LG

    FedMultimodal: A Benchmark For Multimodal Federated Learning

    Authors: Tiantian Feng, Digbalay Bose, Tuo Zhang, Rajat Hebbar, Anil Ramakrishna, Rahul Gupta, Mi Zhang, Salman Avestimehr, Shrikanth Narayanan

    Abstract: Over the past few years, Federated Learning (FL) has become an emerging machine learning technique to tackle data privacy challenges through collaborative training. In the Federated Learning algorithm, the clients submit a locally trained model, and the server aggregates these parameters until convergence. Despite significant efforts that have been made to FL in fields like computer vision, audio,… ▽ More

    Submitted 20 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: This paper was accepted to KDD 2023 Applied Data Science (ADS) track

  6. arXiv:2306.07791  [pdf, other

    cs.SD eess.AS

    Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content

    Authors: Tiantian Feng, Digbalay Bose, Xuan Shi, Shrikanth Narayanan

    Abstract: Automatic Speech Understanding (ASU) leverages the power of deep learning models for accurate interpretation of human speech, leading to a wide range of speech applications that enrich the human experience. However, training a robust ASU model requires the curation of a large number of speech samples, creating risks for privacy breaches. In this work, we investigate using foundation models to assi… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  7. arXiv:2304.08614  [pdf, ps, other

    eess.SP cs.LG

    Signal Processing Grand Challenge 2023 -- e-Prevention: Sleep Behavior as an Indicator of Relapses in Psychotic Patients

    Authors: Kleanthis Avramidis, Kranti Adsul, Digbalay Bose, Shrikanth Narayanan

    Abstract: This paper presents the approach and results of USC SAIL's submission to the Signal Processing Grand Challenge 2023 - e-Prevention (Task 2), on detecting relapses in psychotic patients. Relapse prediction has proven to be challenging, primarily due to the heterogeneity of symptoms and responses to treatment between individuals. We address these challenges by investigating the use of sleep behavior… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: 2 pages, 1 table, ICASSP 2023, Grand Challenges Track

  8. arXiv:2303.06904  [pdf, other

    cs.CV cs.AI cs.CL

    Contextually-rich human affect perception using multimodal scene information

    Authors: Digbalay Bose, Rajat Hebbar, Krishna Somandepalli, Shrikanth Narayanan

    Abstract: The process of human affect understanding involves the ability to infer person specific emotional states from various sources including images, speech, and language. Affect perception from images has predominantly focused on expressions extracted from salient face crops. However, emotions perceived by humans rely on multiple contextual cues including social settings, foreground interactions, and a… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

  9. arXiv:2302.07315  [pdf, other

    eess.AS cs.LG cs.SD

    A dataset for Audio-Visual Sound Event Detection in Movies

    Authors: Rajat Hebbar, Digbalay Bose, Krishna Somandepalli, Veena Vijai, Shrikanth Narayanan

    Abstract: Audio event detection is a widely studied audio processing task, with applications ranging from self-driving cars to healthcare. In-the-wild datasets such as Audioset have propelled research in this field. However, many efforts typically involve manual annotation and verification, which is expensive to perform at scale. Movies depict various real-life and fictional scenarios which makes them a ric… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

  10. arXiv:2210.15826  [pdf, other

    eess.SP cs.HC

    Multimodal Estimation of Change Points of Physiological Arousal in Drivers

    Authors: Kleanthis Avramidis, Tiantian Feng, Digbalay Bose, Shrikanth Narayanan

    Abstract: Detecting unsafe driving states, such as stress, drowsiness, and fatigue, is an important component of ensuring driving safety and an essential prerequisite for automatic intervention systems in vehicles. These concerning conditions are primarily connected to the driver's low or high arousal levels. In this study, we describe a framework for processing multimodal physiological time-series from wea… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 5 pages, 3 tables, 4 figures

  11. arXiv:2210.11065  [pdf, other

    cs.CV cs.CL cs.MM

    MovieCLIP: Visual Scene Recognition in Movies

    Authors: Digbalay Bose, Rajat Hebbar, Krishna Somandepalli, Haoyang Zhang, Yin Cui, Kree Cole-McLaughlin, Huisheng Wang, Shrikanth Narayanan

    Abstract: Longform media such as movies have complex narrative structures, with events spanning a rich variety of ambient visual scenes. Domain specific challenges associated with visual scenes in movies include transitions, person coverage, and a wide array of real-life and fictional scenarios. Existing visual scene datasets in movies have limited taxonomies and don't consider the visual scene transition w… ▽ More

    Submitted 22 October, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: Accepted to 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2023). Project website with supplemental material: https://sail.usc.edu/~mica/MovieCLIP/. Revised version with updated author affiliations

  12. arXiv:2110.06486  [pdf, other

    cs.CV cs.CL

    Understanding of Emotion Perception from Art

    Authors: Digbalay Bose, Krishna Somandepalli, Souvik Kundu, Rimita Lahiri, Jonathan Gratch, Shrikanth Narayanan

    Abstract: Computational modeling of the emotions evoked by art in humans is a challenging problem because of the subjective and nuanced nature of art and affective signals. In this paper, we consider the above-mentioned problem of understanding emotions evoked in viewers by artwork using both text and visual modalities. Specifically, we analyze images and the accompanying text captions from the viewers expr… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: 5 pages, 5 figures. Accepted at ICCV2021: 4th Workshop on Closing the loop between Vision and Language

  13. arXiv:2110.05021  [pdf, other

    cs.CL cs.LG

    Cross Domain Emotion Recognition using Few Shot Knowledge Transfer

    Authors: Justin Olah, Sabyasachee Baruah, Digbalay Bose, Shrikanth Narayanan

    Abstract: Emotion recognition from text is a challenging task due to diverse emotion taxonomies, lack of reliable labeled data in different domains, and highly subjective annotation standards. Few-shot and zero-shot techniques can generalize across unseen emotions by projecting the documents and emotion labels onto a shared embedding space. In this work, we explore the task of few-shot emotion recognition b… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: 5 pages, 4 figures

  14. arXiv:2101.02991  [pdf

    cs.AI

    Artificial Intelligence enabled Smart Learning

    Authors: Faisal Khan, Debdeep Bose

    Abstract: Artificial Intelligence (AI) is a discipline of computer science that deals with machine intelligence. It is essential to bring AI into the context of learning because it helps in analysing the enormous amounts of data that is collected from individual students, teachers and academic staff. The major priorities of implementing AI in education are making innovative use of existing digital technolog… ▽ More

    Submitted 8 January, 2021; originally announced January 2021.

    Comments: 4

    Journal ref: ETH Learning and Teaching Journal: ICED 2020 Proceedings (2020) 153-156

  15. Clustering using Vector Membership: An Extension of the Fuzzy C-Means Algorithm

    Authors: Srinjoy Ganguly, Digbalay Bose, Amit Konar

    Abstract: Clustering is an important facet of explorative data mining and finds extensive use in several fields. In this paper, we propose an extension of the classical Fuzzy C-Means clustering algorithm. The proposed algorithm, abbreviated as VFC, adopts a multi-dimensional membership vector for each data point instead of the traditional, scalar membership value defined in the original algorithm. The membe… ▽ More

    Submitted 14 December, 2013; originally announced December 2013.

    Comments: 6 pages, 8 figures and 1 table (Conference Paper)

    Journal ref: Proceedings of the IEEE International Conference on Advanced Computing (ICoAC)-2013, pp.XX-XX,Chennai, India, 18 - 20 December (2013)

  16. The IceProd Framework: Distributed Data Processing for the IceCube Neutrino Observatory

    Authors: M. G. Aartsen, R. Abbasi, M. Ackermann, J. Adams, J. A. Aguilar, M. Ahlers, D. Altmann, C. Arguelles, J. Auffenberg, X. Bai, M. Baker, S. W. Barwick, V. Baum, R. Bay, J. J. Beatty, J. Becker Tjus, K. -H. Becker, S. BenZvi, P. Berghaus, D. Berley, E. Bernardini, A. Bernhard, D. Z. Besson, G. Binder, D. Bindig , et al. (262 additional authors not shown)

    Abstract: IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, iden- tify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. IceProd is a distributed management system based on Python, XML-RPC and GridFTP. It… ▽ More

    Submitted 22 August, 2014; v1 submitted 22 November, 2013; originally announced November 2013.

    Journal ref: Journal of Parallel & Distributed Computing 75:198,2015

  17. arXiv:1111.4090  [pdf

    cs.NI

    Different types of attacks in Mobile ADHOC Network

    Authors: Aniruddha Bhattacharyya, Arnab Banerjee, Dipayan Bose, Himadri Nath Saha, Debika Bhattacharya

    Abstract: Security in mobile AD HOC network is a big challenge as it has no centralized authority which can supervise the individual nodes operating in the network. The attacks can come from both inside the network and from the outside. We are trying to classify the existing attacks into two broad categories: DATA traffic attacks and CONTROL traffic attacks. We will also be discussing the presently proposed… ▽ More

    Submitted 17 November, 2011; originally announced November 2011.

    Comments: 11 Pages, 8 Figures

  18. arXiv:1011.2163  [pdf

    cs.SE

    Component Based Development

    Authors: Debayan Bose

    Abstract: Component Based Approach has been introduced in core engineering discipline long back but the introduction to component based concept in software perspective is recently developed by Object Management Group. Its benefits from the re-usability point of view is enormous. The intertwining relationship of domain engineering with component based software engineering is analyzed. The object oriented app… ▽ More

    Submitted 9 November, 2010; originally announced November 2010.

    Comments: 25 pages, 3 figures