-
DynaMaR: Dynamic Prompt with Mask Token Representation
Authors:
Xiaodi Sun,
Sunny Rajagopalan,
Priyanka Nigam,
Weiyi Lu,
Yi Xu,
Belinda Zeng,
Trishul Chilimbi
Abstract:
Recent research has shown that large language models pretrained using unsupervised approaches can achieve significant performance improvement on many downstream tasks. Typically when adapting these language models to downstream tasks, like a classification or regression task, we employ a fine-tuning paradigm in which the sentence representation from the language model is input to a task-specific h…
▽ More
Recent research has shown that large language models pretrained using unsupervised approaches can achieve significant performance improvement on many downstream tasks. Typically when adapting these language models to downstream tasks, like a classification or regression task, we employ a fine-tuning paradigm in which the sentence representation from the language model is input to a task-specific head; the model is then fine-tuned end-to-end. However, with the emergence of models like GPT-3, prompt-based fine-tuning has been proven to be a successful approach for few-shot tasks. Inspired by this work, we study discrete prompt technologies in practice. There are two issues that arise with the standard prompt approach. First, it can overfit on the prompt template. Second, it requires manual effort to formulate the downstream task as a language model problem. In this paper, we propose an improvement to prompt-based fine-tuning that addresses these two issues. We refer to our approach as DynaMaR -- Dynamic Prompt with Mask Token Representation. Results show that DynaMaR can achieve an average improvement of 10% in few-shot settings and improvement of 3.7% in data-rich settings over the standard fine-tuning approach on four e-commerce applications.
△ Less
Submitted 6 June, 2022;
originally announced June 2022.
-
Embracing Structure in Data for Billion-Scale Semantic Product Search
Authors:
Vihan Lakshman,
Choon Hui Teo,
Xiaowen Chu,
Priyanka Nigam,
Abhinandan Patni,
Pooja Maknikar,
SVN Vishwanathan
Abstract:
We present principled approaches to train and deploy dyadic neural embedding models at the billion scale, focusing our investigation on the application of semantic product search. When training a dyadic model, one seeks to embed two different types of entities (e.g., queries and documents or users and movies) in a common vector space such that pairs with high relevance are positioned nearby. Durin…
▽ More
We present principled approaches to train and deploy dyadic neural embedding models at the billion scale, focusing our investigation on the application of semantic product search. When training a dyadic model, one seeks to embed two different types of entities (e.g., queries and documents or users and movies) in a common vector space such that pairs with high relevance are positioned nearby. During inference, given an embedding of one type (e.g., a query or a user), one seeks to retrieve the entities of the other type (e.g., documents or movies, respectively) that are highly relevant. In this work, we show that exploiting the natural structure of real-world datasets helps address both challenges efficiently. Specifically, we model dyadic data as a bipartite graph with edges between pairs with positive associations. We then propose to partition this network into semantically coherent clusters and thus reduce our search space by focusing on a small subset of these partitions for a given input. During training, this technique enables us to efficiently mine hard negative examples while, at inference, we can quickly find the nearest neighbors for a given embedding. We provide offline experimental results that demonstrate the efficacy of our techniques for both training and inference on a billion-scale Amazon.com product search dataset.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
Semantic Product Search
Authors:
Priyanka Nigam,
Yiwei Song,
Vijai Mohan,
Vihan Lakshman,
Weitian,
Ding,
Ankit Shingavi,
Choon Hui Teo,
Hao Gu,
Bing Yin
Abstract:
We study the problem of semantic matching in product search, that is, given a customer query, retrieve all semantically related products from the catalog. Pure lexical matching via an inverted index falls short in this respect due to several factors: a) lack of understanding of hypernyms, synonyms, and antonyms, b) fragility to morphological variants (e.g. "woman" vs. "women"), and c) sensitivity…
▽ More
We study the problem of semantic matching in product search, that is, given a customer query, retrieve all semantically related products from the catalog. Pure lexical matching via an inverted index falls short in this respect due to several factors: a) lack of understanding of hypernyms, synonyms, and antonyms, b) fragility to morphological variants (e.g. "woman" vs. "women"), and c) sensitivity to spelling errors. To address these issues, we train a deep learning model for semantic matching using customer behavior data. Much of the recent work on large-scale semantic search using deep learning focuses on ranking for web search. In contrast, semantic matching for product search presents several novel challenges, which we elucidate in this paper. We address these challenges by a) develo** a new loss function that has an inbuilt threshold to differentiate between random negative examples, impressed but not purchased examples, and positive examples (purchased items), b) using average pooling in conjunction with n-grams to capture short-range linguistic patterns, c) using hashing to handle out of vocabulary tokens, and d) using a model parallel training architecture to scale across 8 GPUs. We present compelling offline results that demonstrate at least 4.7% improvement in Recall@100 and 14.5% improvement in mean average precision (MAP) over baseline state-of-the-art semantic search methods using the same tokenization method. Moreover, we present results and discuss learnings from online A/B tests which demonstrate the efficacy of our method.
△ Less
Submitted 1 July, 2019;
originally announced July 2019.
-
Scaling of the Puffing Strouhal Number for Buoyant Jets
Authors:
Nicholas T. Wimer,
Caelan Lapointe,
Jason D. Christopher,
Siddharth P. Nigam,
Torrey R. S. Hayden,
Aniruddha Upadhye,
Mark Strobel,
Gregory B. Rieker,
Peter E. Hamlington
Abstract:
Prior research has shown that round and planar buoyant jets "puff" at a frequency that depends on the balance of momentum and buoyancy fluxes at the inlet, as parametrized by the Richardson number. Experiments have revealed the existence of scaling relations between the Strouhal number of the puffing and the inlet Richardson number, but geometry-specific relations are required when the characteris…
▽ More
Prior research has shown that round and planar buoyant jets "puff" at a frequency that depends on the balance of momentum and buoyancy fluxes at the inlet, as parametrized by the Richardson number. Experiments have revealed the existence of scaling relations between the Strouhal number of the puffing and the inlet Richardson number, but geometry-specific relations are required when the characteristic length is taken to be the diameter (for round inlets) or width (for planar inlets). In the present study, we show that when the hydraulic radius of the inlet is instead used as the characteristic length, a single Strouhal-Richardson scaling relation is obtained for a variety of inlet geometries. In particular, we use adaptive mesh numerical simulations to measure puffing Strouhal numbers for circular, rectangular (with three different aspect ratios), triangular, and annular high-temperature buoyant jets over a range of Richardson numbers. We then combine these results with prior experimental data for round and planar buoyant jets to propose a new scaling relation that accurately describes puffing Strouhal numbers for various inlet shapes and for Richardson numbers spanning over four orders of magnitude.
△ Less
Submitted 2 April, 2019;
originally announced April 2019.