-
OmniSearchSage: Multi-Task Multi-Entity Embeddings for Pinterest Search
Authors:
Prabhat Agarwal,
Minhazul Islam Sk,
Nikil Pancha,
Kurchi Subhra Hazra,
Jia**g Xu,
Chuck Rosenberg
Abstract:
In this paper, we present OmniSearchSage, a versatile and scalable system for understanding search queries, pins, and products for Pinterest search. We jointly learn a unified query embedding coupled with pin and product embeddings, leading to an improvement of $>8\%$ relevance, $>7\%$ engagement, and $>5\%$ ads CTR in Pinterest's production search system. The main contributors to these gains are…
▽ More
In this paper, we present OmniSearchSage, a versatile and scalable system for understanding search queries, pins, and products for Pinterest search. We jointly learn a unified query embedding coupled with pin and product embeddings, leading to an improvement of $>8\%$ relevance, $>7\%$ engagement, and $>5\%$ ads CTR in Pinterest's production search system. The main contributors to these gains are improved content understanding, better multi-task learning, and real-time serving. We enrich our entity representations using diverse text derived from image captions from a generative LLM, historical engagement, and user-curated boards. Our multitask learning setup produces a single search query embedding in the same space as pin and product embeddings and compatible with pre-existing pin and product embeddings. We show the value of each feature through ablation studies, and show the effectiveness of a unified model compared to standalone counterparts. Finally, we share how these embeddings have been deployed across the Pinterest search stack, from retrieval to ranking, scaling to serve $300k$ requests per second at low latency. Our implementation of this work is available at https://github.com/pinterest/atg-research/tree/main/omnisearchsage.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Data-driven Discovery with Large Generative Models
Authors:
Bodhisattwa Prasad Majumder,
Harshit Surana,
Dhruv Agarwal,
Sanchaita Hazra,
Ashish Sabharwal,
Peter Clark
Abstract:
With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially. This position paper urges the Machine Learning (ML) community to exploit the capabilities of large generative models (LGMs) to develop automated systems for end-to-end data-driven discovery -- a paradigm encompassing the search and verification of hypotheses purely from a se…
▽ More
With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially. This position paper urges the Machine Learning (ML) community to exploit the capabilities of large generative models (LGMs) to develop automated systems for end-to-end data-driven discovery -- a paradigm encompassing the search and verification of hypotheses purely from a set of provided datasets, without the need for additional data collection or physical experiments. We first outline several desiderata for an ideal data-driven discovery system. Then, through DATAVOYAGER, a proof-of-concept utilizing GPT-4, we demonstrate how LGMs fulfill several of these desiderata -- a feat previously unattainable -- while also highlighting important limitations in the current system that open up opportunities for novel ML research. We contend that achieving accurate, reliable, and robust end-to-end discovery systems solely through the current capabilities of LGMs is challenging. We instead advocate for fail-proof tool integration, along with active user moderation through feedback mechanisms, to foster data-driven scientific discoveries with efficiency and reproducibility.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
To Tell The Truth: Language of Deception and Language Models
Authors:
Sanchaita Hazra,
Bodhisattwa Prasad Majumder
Abstract:
Text-based misinformation permeates online discourses, yet evidence of people's ability to discern truth from such deceptive textual content is scarce. We analyze a novel TV game show data where conversations in a high-stake environment between individuals with conflicting objectives result in lies. We investigate the manifestation of potentially verifiable language cues of deception in the presen…
▽ More
Text-based misinformation permeates online discourses, yet evidence of people's ability to discern truth from such deceptive textual content is scarce. We analyze a novel TV game show data where conversations in a high-stake environment between individuals with conflicting objectives result in lies. We investigate the manifestation of potentially verifiable language cues of deception in the presence of objective truth, a distinguishing feature absent in previous text-based deception datasets. We show that there exists a class of detectors (algorithms) that have similar truth detection performance compared to human subjects, even when the former accesses only the language cues while the latter engages in conversations with complete access to all potential sources of cues (language and audio-visual). Our model, built on a large language model, employs a bottleneck framework to learn discernible cues to determine truth, an act of reasoning in which human subjects often perform poorly, even with incentives. Our model detects novel but accurate language cues in many cases where humans failed to detect deception, opening up the possibility of humans collaborating with algorithms and ameliorating their ability to detect the truth.
△ Less
Submitted 8 April, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
On Distribution-Preserving Mitigation Strategies for Communication under Cognitive Adversaries
Authors:
Soumita Hazra,
J. Harshan
Abstract:
In wireless security, cognitive adversaries are known to inject jamming energy on the victim's frequency band and monitor the same band for countermeasures thereby trap** the victim. Under the class of cognitive adversaries, we propose a new threat model wherein the adversary, upon executing the jamming attack, measures the long-term statistic of Kullback-Leibler Divergence (KLD) between its obs…
▽ More
In wireless security, cognitive adversaries are known to inject jamming energy on the victim's frequency band and monitor the same band for countermeasures thereby trap** the victim. Under the class of cognitive adversaries, we propose a new threat model wherein the adversary, upon executing the jamming attack, measures the long-term statistic of Kullback-Leibler Divergence (KLD) between its observations over each of the network frequencies before and after the jamming attack. To mitigate this adversary, we propose a new cooperative strategy wherein the victim takes the assistance for a helper node in the network to reliably communicate its message to the destination. The underlying idea is to appropriately split their energy and time resources such that their messages are reliably communicated without disturbing the statistical distribution of the samples in the network. We present rigorous analyses on the reliability and the covertness metrics at the destination and the adversary, respectively, and then synthesize tractable algorithms to obtain near-optimal division of resources between the victim and the helper. Finally, we show that the obtained near-optimal division of energy facilitates in deceiving the adversary with a KLD estimator.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Penalizing Proposals using Classifiers for Semi-Supervised Object Detection
Authors:
Somnath Hazra,
Pallab Dasgupta
Abstract:
Obtaining gold standard annotated data for object detection is often costly, involving human-level effort. Semi-supervised object detection algorithms solve the problem with a small amount of gold-standard labels and a large unlabelled dataset used to generate silver-standard labels. But training on the silver standard labels does not produce good results, because they are machine-generated annota…
▽ More
Obtaining gold standard annotated data for object detection is often costly, involving human-level effort. Semi-supervised object detection algorithms solve the problem with a small amount of gold-standard labels and a large unlabelled dataset used to generate silver-standard labels. But training on the silver standard labels does not produce good results, because they are machine-generated annotations. In this work, we design a modified loss function to train on large silver standard annotated sets generated by a weak annotator. We include a confidence metric associated with the annotation as an additional term in the loss function, signifying the quality of the annotation. We test the effectiveness of our approach on various test sets and use numerous variations to compare the results with some of the current approaches to object detection. In comparison with the baseline where no confidence metric is used, we achieved a 4% gain in mAP with 25% labeled data and 10% gain in mAP with 50% labeled data by using the proposed confidence metric.
△ Less
Submitted 2 June, 2022; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Cross-modal Learning of Graph Representations using Radar Point Cloud for Long-Range Gesture Recognition
Authors:
Souvik Hazra,
Hao Feng,
Gamze Naz Kiprit,
Michael Stephan,
Lorenzo Servadei,
Robert Wille,
Robert Weigel,
Avik Santra
Abstract:
Gesture recognition is one of the most intuitive ways of interaction and has gathered particular attention for human computer interaction. Radar sensors possess multiple intrinsic properties, such as their ability to work in low illumination, harsh weather conditions, and being low-cost and compact, making them highly preferable for a gesture recognition solution. However, most literature work foc…
▽ More
Gesture recognition is one of the most intuitive ways of interaction and has gathered particular attention for human computer interaction. Radar sensors possess multiple intrinsic properties, such as their ability to work in low illumination, harsh weather conditions, and being low-cost and compact, making them highly preferable for a gesture recognition solution. However, most literature work focuses on solutions with a limited range that is lower than a meter. We propose a novel architecture for a long-range (1m - 2m) gesture recognition solution that leverages a point cloud-based cross-learning approach from camera point cloud to 60-GHz FMCW radar point cloud, which allows learning better representations while suppressing noise. We use a variant of Dynamic Graph CNN (DGCNN) for the cross-learning, enabling us to model relationships between the points at a local and global level and to model the temporal dynamics a Bi-LSTM network is employed. In the experimental results section, we demonstrate our model's overall accuracy of 98.4% for five gestures and its generalization capability.
△ Less
Submitted 19 May, 2022; v1 submitted 31 March, 2022;
originally announced March 2022.
-
Label-Aware Ranked Loss for robust People Counting using Automotive in-cabin Radar
Authors:
Lorenzo Servadei,
Huawei Sun,
Julius Ott,
Michael Stephan,
Souvik Hazra,
Thomas Stadelmayer,
Daniela Sanchez Lopera,
Robert Wille,
Avik Santra
Abstract:
In this paper, we introduce the Label-Aware Ranked loss, a novel metric loss function. Compared to the state-of-the-art Deep Metric Learning losses, this function takes advantage of the ranked ordering of the labels in regression problems. To this end, we first show that the loss minimises when datapoints of different labels are ranked and laid at uniform angles between each other in the embedding…
▽ More
In this paper, we introduce the Label-Aware Ranked loss, a novel metric loss function. Compared to the state-of-the-art Deep Metric Learning losses, this function takes advantage of the ranked ordering of the labels in regression problems. To this end, we first show that the loss minimises when datapoints of different labels are ranked and laid at uniform angles between each other in the embedding space. Then, to measure its performance, we apply the proposed loss on a regression task of people counting with a short-range radar in a challenging scenario, namely a vehicle cabin. The introduced approach improves the accuracy as well as the neighboring labels accuracy up to 83.0% and 99.9%: An increase of 6.7%and 2.1% on state-of-the-art methods, respectively.
△ Less
Submitted 3 March, 2022; v1 submitted 12 October, 2021;
originally announced October 2021.
-
Semi-Lexical Languages -- A Formal Basis for Unifying Machine Learning and Symbolic Reasoning in Computer Vision
Authors:
Briti Gangopadhyay,
Somnath Hazra,
Pallab Dasgupta
Abstract:
Human vision is able to compensate imperfections in sensory inputs from the real world by reasoning based on prior knowledge about the world. Machine learning has had a significant impact on computer vision due to its inherent ability in handling imprecision, but the absence of a reasoning framework based on domain knowledge limits its ability to interpret complex scenarios. We propose semi-lexica…
▽ More
Human vision is able to compensate imperfections in sensory inputs from the real world by reasoning based on prior knowledge about the world. Machine learning has had a significant impact on computer vision due to its inherent ability in handling imprecision, but the absence of a reasoning framework based on domain knowledge limits its ability to interpret complex scenarios. We propose semi-lexical languages as a formal basis for dealing with imperfect tokens provided by the real world. The power of machine learning is used to map the imperfect tokens into the alphabet of the language and symbolic reasoning is used to determine the membership of input in the language. Semi-lexical languages also have bindings that prevent the variations in which a semi-lexical token is interpreted in different parts of the input, thereby leaning on deduction to enhance the quality of recognition of individual tokens. We present case studies that demonstrate the advantage of using such a framework over pure machine learning and pure symbolic methods.
△ Less
Submitted 17 December, 2020; v1 submitted 25 April, 2020;
originally announced April 2020.