Search | arXiv e-print repository

Improved Object-Based Style Transfer with Single Deep Network

Authors: Harshmohan Kulkarni, Om Khare, Ninad Barve, Sunil Mane

Abstract: This research paper proposes a novel methodology for image-to-image style transfer on objects utilizing a single deep convolutional neural network. The proposed approach leverages the You Only Look Once version 8 (YOLOv8) segmentation model and the backbone neural network of YOLOv8 for style transfer. The primary objective is to enhance the visual appeal of objects in images by seamlessly transfer… ▽ More This research paper proposes a novel methodology for image-to-image style transfer on objects utilizing a single deep convolutional neural network. The proposed approach leverages the You Only Look Once version 8 (YOLOv8) segmentation model and the backbone neural network of YOLOv8 for style transfer. The primary objective is to enhance the visual appeal of objects in images by seamlessly transferring artistic styles while preserving the original object characteristics. The proposed approach's novelty lies in combining segmentation and style transfer in a single deep convolutional neural network. This approach omits the need for multiple stages or models, thus resulting in simpler training and deployment of the model for practical applications. The results of this approach are shown on two content images by applying different style images. The paper also demonstrates the ability to apply style transfer on multiple objects in the same image. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: In Proceedings of the Fourth International Conference on Innovations in Computational Intelligence and Computer Vision

arXiv:2307.16779 [pdf, other]

doi 10.1145/3539618.3591715

Lexically-Accelerated Dense Retrieval

Authors: Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian, Ophir Frieder

Abstract: Retrieval approaches that score documents based on learned dense vectors (i.e., dense retrieval) rather than lexical signals (i.e., conventional retrieval) are increasingly popular. Their ability to identify related documents that do not necessarily contain the same terms as those appearing in the user's query (thereby improving recall) is one of their key advantages. However, to actually achieve… ▽ More Retrieval approaches that score documents based on learned dense vectors (i.e., dense retrieval) rather than lexical signals (i.e., conventional retrieval) are increasingly popular. Their ability to identify related documents that do not necessarily contain the same terms as those appearing in the user's query (thereby improving recall) is one of their key advantages. However, to actually achieve these gains, dense retrieval approaches typically require an exhaustive search over the document collection, making them considerably more expensive at query-time than conventional lexical approaches. Several techniques aim to reduce this computational overhead by approximating the results of a full dense retriever. Although these approaches reasonably approximate the top results, they suffer in terms of recall -- one of the key advantages of dense retrieval. We introduce 'LADR' (Lexically-Accelerated Dense Retrieval), a simple-yet-effective approach that improves the efficiency of existing dense retrieval models without compromising on retrieval effectiveness. LADR uses lexical retrieval techniques to seed a dense retrieval exploration that uses a document proximity graph. We explore two variants of LADR: a proactive approach that expands the search space to the neighbors of all seed documents, and an adaptive approach that selectively searches the documents with the highest estimated relevance in an iterative fashion. Through extensive experiments across a variety of dense retrieval models, we find that LADR establishes a new dense retrieval effectiveness-efficiency Pareto frontier among approximate k nearest neighbor techniques. Further, we find that when tuned to take around 8ms per query in retrieval latency on our hardware, LADR consistently achieves both precision and recall that are on par with an exhaustive search on standard benchmarks. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: SIGIR 2023

arXiv:2302.08572 [pdf, other]

Towards Reliable Assessments of Demographic Disparities in Multi-Label Image Classifiers

Authors: Melissa Hall, Bobbie Chern, Laura Gustafson, Denisse Ventura, Harshad Kulkarni, Candace Ross, Nicolas Usunier

Abstract: Disaggregated performance metrics across demographic groups are a hallmark of fairness assessments in computer vision. These metrics successfully incentivized performance improvements on person-centric tasks such as face analysis and are used to understand risks of modern models. However, there is a lack of discussion on the vulnerabilities of these measurements for more complex computer vision ta… ▽ More Disaggregated performance metrics across demographic groups are a hallmark of fairness assessments in computer vision. These metrics successfully incentivized performance improvements on person-centric tasks such as face analysis and are used to understand risks of modern models. However, there is a lack of discussion on the vulnerabilities of these measurements for more complex computer vision tasks. In this paper, we consider multi-label image classification and, specifically, object categorization tasks. First, we highlight design choices and trade-offs for measurement that involve more nuance than discussed in prior computer vision literature. These challenges are related to the necessary scale of data, definition of groups for images, choice of metric, and dataset imbalances. Next, through two case studies using modern vision models, we demonstrate that naive implementations of these assessments are brittle. We identify several design choices that look merely like implementation details but significantly impact the conclusions of assessments, both in terms of magnitude and direction (on which group the classifiers work best) of disparities. Based on ablation studies, we propose some recommendations to increase the reliability of these assessments. Finally, through a qualitative analysis we find that concepts with large disparities tend to have varying definitions and representations between groups, with inconsistencies across datasets and annotators. While this result suggests avenues for mitigation through more consistent data collection, it also highlights that ambiguous label definitions remain a challenge when performing model assessments. Vision models are expanding and becoming more ubiquitous; it is even more important that our disparity assessments accurately reflect the true performance of models. △ Less

Submitted 16 February, 2023; originally announced February 2023.

arXiv:2205.08355 [pdf, other]

Demystifying the Data Need of ML-surrogates for CFD Simulations

Authors: Tongtao Zhang, Biswadip Dey, Krishna Veeraraghavan, Harshad Kulkarni, Amit Chakraborty

Abstract: Computational fluid dynamics (CFD) simulations, a critical tool in various engineering applications, often require significant time and compute power to predict flow properties. The high computational cost associated with CFD simulations significantly restricts the scope of design space exploration and limits their use in planning and operational control. To address this issue, machine learning (M… ▽ More Computational fluid dynamics (CFD) simulations, a critical tool in various engineering applications, often require significant time and compute power to predict flow properties. The high computational cost associated with CFD simulations significantly restricts the scope of design space exploration and limits their use in planning and operational control. To address this issue, machine learning (ML) based surrogate models have been proposed as a computationally efficient tool to accelerate CFD simulations. However, a lack of clarity about CFD data requirements often challenges the widespread adoption of ML-based surrogates among design engineers and CFD practitioners. In this work, we propose an ML-based surrogate model to predict the temperature distribution inside the cabin of a passenger vehicle under various operating conditions and use it to demonstrate the trade-off between prediction performance and training dataset size. Our results show that the prediction accuracy is high and stable even when the training size is gradually reduced from 2000 to 200. The ML-based surrogates also reduce the compute time from ~30 minutes to around ~9 milliseconds. Moreover, even when only 50 CFD simulations are used for training, the temperature trend (e.g., locations of hot/cold regions) predicted by the ML-surrogate matches quite well with the results from CFD simulations. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: Published on AI2ASE AAAI2022

MSC Class: I.6

arXiv:2106.13767 [pdf]

Sentiment Progression based Searching and Indexing of Literary Textual Artefacts

Authors: Hrishikesh Kulkarni, Bradly Alicea

Abstract: Literary artefacts are generally indexed and searched based on titles, meta data and keywords over the years. This searching and indexing works well when user/reader already knows about that particular creative textual artefact or document. This indexing and search hardly takes into account interest and emotional makeup of readers and its map** to books. When a person is looking for a literary t… ▽ More Literary artefacts are generally indexed and searched based on titles, meta data and keywords over the years. This searching and indexing works well when user/reader already knows about that particular creative textual artefact or document. This indexing and search hardly takes into account interest and emotional makeup of readers and its map** to books. When a person is looking for a literary textual artefact, he/she might be looking for not only information but also to seek the joy of reading. In case of literary artefacts, progression of emotions across the key events could prove to be the key for indexing and searching. In this paper, we establish clusters among literary artefacts based on computational relationships among sentiment progressions using intelligent text analysis. We have created a database of 1076 English titles + 20 Marathi titles and also used database http://www.cs.cmu.edu/~dbamman/booksummaries.html with 16559 titles and their summaries. We have proposed Sentiment Progression based Indexing for searching and recommending books. This can be used to create personalized clusters of book titles of interest to readers. The analysis clearly suggests better searching and indexing when we are targeting book lovers looking for a particular type of book or creative artefact. This indexing and searching can find many real-life applications for recommending books. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: 12 pages, 2 figures, accepted at NLDB 2021

arXiv:2106.00891 [pdf, other]

High-Quality Diversification for Task-Oriented Dialogue Systems

Authors: Zhiwen Tang, Hrishikesh Kulkarni, Grace Hui Yang

Abstract: Many task-oriented dialogue systems use deep reinforcement learning (DRL) to learn policies that respond to the user appropriately and complete the tasks successfully. Training DRL agents with diverse dialogue trajectories prepare them well for rare user requests and unseen situations. One effective diversification method is to let the agent interact with a diverse set of learned user models. Howe… ▽ More Many task-oriented dialogue systems use deep reinforcement learning (DRL) to learn policies that respond to the user appropriately and complete the tasks successfully. Training DRL agents with diverse dialogue trajectories prepare them well for rare user requests and unseen situations. One effective diversification method is to let the agent interact with a diverse set of learned user models. However, trajectories created by these artificial user models may contain generation errors, which can quickly propagate into the agent's policy. It is thus important to control the quality of the diversification and resist the noise. In this paper, we propose a novel dialogue diversification method for task-oriented dialogue systems trained in simulators. Our method, Intermittent Short Extension Ensemble (I-SEE), constrains the intensity to interact with an ensemble of diverse user models and effectively controls the quality of the diversification. Evaluations on the Multiwoz dataset show that I-SEE successfully boosts the performance of several state-of-the-art DRL dialogue agents. △ Less

Submitted 8 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

Comments: Accepted by ACL-IJCNLP 2021 (Findings of ACL)

arXiv:1910.06859 [pdf]

Computational Psychology to Embed Emotions into News or Advertisements to Increase Reader Affinity

Authors: Hrishikesh Kulkarni, P Joshi, P Chande

Abstract: Readers take decisions about going through the complete news based on many factors. The emotional impact of the news title on reader is one of the most important factors. Cognitive ergonomics tries to strike the balance between work, product and environment with human needs and capabilities. The utmost need to integrate emotions in the news as well as advertisements cannot be denied. The idea is t… ▽ More Readers take decisions about going through the complete news based on many factors. The emotional impact of the news title on reader is one of the most important factors. Cognitive ergonomics tries to strike the balance between work, product and environment with human needs and capabilities. The utmost need to integrate emotions in the news as well as advertisements cannot be denied. The idea is that news or advertisement should be able to engage the reader on emotional and behavioral platform. While achieving this objective there is need to learn about reader behavior and use computational psychology while presenting as well as writing news or advertisements. This paper based on Machine Learning, tries to map behavior of the reader with the news/advertisements and also provide inputs for affective value for building personalized news or advertisements presentations. The affective value of the news is determined and news artifacts are mapped to reader. The algorithm suggests the most suitable news for readers while understanding emotional traits required for personalization. This work can be used to improve reader satisfaction through embedding emotions in the reading material and prioritizing news presentations. It can be used to map personal reading material range, personalized programs and ranking programs, advertisements with reference to individuals. △ Less

Submitted 14 October, 2019; originally announced October 2019.

arXiv:1908.00234 [pdf]

Cultural association based on machine learning for team formation

Authors: Hrishikesh Kulkarni, Bradly Alicea

Abstract: Culture is core to human civilization, and is essential for human intellectual achievements in social context. Culture also influences how humans work together, perform particular task and overall lifestyle and dealing with other groups of civilization. Thus, culture is concerned with establishing shared ideas, particularly those playing a key role in success. Does it impact on how two individuals… ▽ More Culture is core to human civilization, and is essential for human intellectual achievements in social context. Culture also influences how humans work together, perform particular task and overall lifestyle and dealing with other groups of civilization. Thus, culture is concerned with establishing shared ideas, particularly those playing a key role in success. Does it impact on how two individuals can work together in achieving certain goals? In this paper, we establish a means to derive cultural association and map it to culturally mediated success. Human interactions with the environment are typically in the form of expressions. Association between culture and behavior produce similar beliefs which lead to common principles and actions, while cultural similarity as a set of common expressions and responses. To measure cultural association among different candidates, we propose the use of a Graphical Association Method (GAM). The behaviors of candidates are captured through series of expressions and represented in the graphical form. The association among corresponding node and core nodes is used for the same. Our approach provides a number of interesting results and promising avenues for future applications. △ Less

Submitted 1 August, 2019; originally announced August 2019.

Comments: 10 pages, 2 figures

Showing 1–8 of 8 results for author: Kulkarni, H