Skip to main content

Showing 1–50 of 70 results for author: Mitra, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.11612  [pdf, other

    cs.IR cs.AI

    Sociotechnical Implications of Generative Artificial Intelligence for Information Access

    Authors: Bhaskar Mitra, Henriette Cramer, Olya Gurevich

    Abstract: Robust access to trustworthy information is a critical need for society with implications for knowledge production, public health education, and promoting informed citizenry in democratic societies. Generative AI technologies may enable new ways to access information and improve effectiveness of existing information retrieval systems but we are only starting to understand and grapple with their lo… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  2. arXiv:2405.07767  [pdf, other

    cs.IR cs.AI

    Synthetic Test Collections for Retrieval Evaluation

    Authors: Hossein A. Rahmani, Nick Craswell, Emine Yilmaz, Bhaskar Mitra, Daniel Campos

    Abstract: Test collections play a vital role in evaluation of information retrieval (IR) systems. Obtaining a diverse set of user queries for test collection construction can be challenging, and acquiring relevance judgments, which indicate the appropriateness of retrieved documents to a query, is often costly and resource-intensive. Generating synthetic datasets using Large Language Models (LLMs) has recen… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: SIGIR 2024

  3. arXiv:2404.17313  [pdf, other

    cs.IR

    Towards Group-aware Search Success

    Authors: Haolun Wu, Bhaskar Mitra, Nick Craswell

    Abstract: Traditional measures of search success often overlook the varying information needs of different demographic groups. To address this gap, we introduce a novel metric, named Group-aware Search Success (GA-SS). GA-SS redefines search success to ensure that all demographic groups achieve satisfaction from search outcomes. We introduce a comprehensive mathematical framework to calculate GA-SS, incorpo… ▽ More

    Submitted 22 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

  4. arXiv:2403.17901  [pdf, other

    cs.IR

    Search and Society: Reimagining Information Access for Radical Futures

    Authors: Bhaskar Mitra

    Abstract: Information retrieval (IR) technologies and research are undergoing transformative changes. It is our perspective that the community should accept this opportunity to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  5. arXiv:2402.04437  [pdf, other

    cs.CL cs.LG

    Learning to Extract Structured Entities Using Language Models

    Authors: Haolun Wu, Ye Yuan, Liana Mikaelyan, Alexander Meulemans, Xue Liu, James Hensman, Bhaskar Mitra

    Abstract: Recent advances in machine learning have significantly impacted the field of information extraction, with Language Models (LMs) playing a pivotal role in extracting structured information from unstructured text. Prior works typically represent information extraction as triplet-centric and use classical metrics such as precision and recall for evaluation. We reformulate the task to be entity-centri… ▽ More

    Submitted 18 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  6. arXiv:2401.17545  [pdf

    cs.SE eess.SY

    Three-Stage Adjusted Regression Forecasting (TSARF) for Software Defect Prediction

    Authors: Shadow Pritchard, Bhaskar Mitra, Vidhyashree Nagaraju

    Abstract: Software reliability growth models (SRGM) enable failure data collected during testing. Specifically, nonhomogeneous Poisson process (NHPP) SRGM are the most commonly employed models. While software reliability growth models are important, efficient modeling of complex software systems increases the complexity of models. Increased model complexity presents a challenge in identifying robust and com… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  7. arXiv:2401.09410  [pdf, other

    cs.CY cs.AI cs.HC

    Through the Looking-Glass: Transparency Implications and Challenges in Enterprise AI Knowledge Systems

    Authors: Karina Cortiñas-Lorenzo, Siân Lindley, Ida Larsen-Ledet, Bhaskar Mitra

    Abstract: Knowledge can't be disentangled from people. As AI knowledge systems mine vast volumes of work-related data, the knowledge that's being extracted and surfaced is intrinsically linked to the people who create and use it. When these systems get embedded in organizational settings, the information that is brought to the foreground and the information that's pushed to the periphery can influence how i… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  8. arXiv:2312.10076  [pdf, ps, other

    cs.CY

    A Framework for Exploring the Consequences of AI-Mediated Enterprise Knowledge Access and Identifying Risks to Workers

    Authors: Anna Gausen, Bhaskar Mitra, Siân Lindley

    Abstract: Organisations generate vast amounts of information, which has resulted in a long-term research effort into knowledge access systems for enterprise settings. Recent developments in artificial intelligence, in relation to large language models, are poised to have significant impact on knowledge access. This has the potential to shape the workplace and knowledge in new and unanticipated ways. Many ri… ▽ More

    Submitted 30 April, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: 19 pages, 1 table

  9. arXiv:2312.05253  [pdf, other

    cs.LG cs.AI

    DiSK: A Diffusion Model for Structured Knowledge

    Authors: Ouail Kitouni, Niklas Nolte, James Hensman, Bhaskar Mitra

    Abstract: Structured (dictionary-like) data presents challenges for left-to-right language models, as they can struggle with structured entities for a wide variety of reasons such as formatting and sensitivity to the order in which attributes are presented. Tabular generative models suffer from a different set of limitations such as their lack of flexibility. We introduce Diffusion Models of Structured Know… ▽ More

    Submitted 7 February, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: 24 pages, 12 figures

  10. arXiv:2310.01297  [pdf, other

    cs.HC cs.AI cs.CL cs.PL

    Co-audit: tools to help humans double-check AI-generated content

    Authors: Andrew D. Gordon, Carina Negreanu, José Cambronero, Rasika Chakravarthy, Ian Drosos, Hao Fang, Bhaskar Mitra, Hannah Richardson, Advait Sarkar, Stephanie Simmons, Jack Williams, Ben Zorn

    Abstract: Users are increasingly being warned to check AI-generated content for correctness. Still, as LLMs (and other generative models) generate more complex output, such as summaries, tables, or code, it becomes harder for the user to audit or evaluate the output for quality or correctness. Hence, we are seeing the emergence of tool-assisted experiences to help the user double-check a piece of AI-generat… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  11. arXiv:2309.10621  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Large language models can accurately predict searcher preferences

    Authors: Paul Thomas, Seth Spielman, Nick Craswell, Bhaskar Mitra

    Abstract: Relevance labels, which indicate whether a search result is valuable to a searcher, are key to evaluating and optimising search systems. The best way to capture the true preferences of users is to ask them for their careful feedback on which results would be useful, but this approach does not scale to produce a large number of labels. Getting relevance labels at scale is usually done with third-pa… ▽ More

    Submitted 16 May, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

  12. arXiv:2309.03294  [pdf, other

    cs.CR

    MALITE: Lightweight Malware Detection and Classification for Constrained Devices

    Authors: Sidharth Anand, Barsha Mitra, Soumyadeep Dey, Abhinav Rao, Rupsa Dhar, Jaideep Vaidya

    Abstract: Today, malware is one of the primary cyberthreats to organizations. Malware has pervaded almost every type of computing device including the ones having limited memory, battery and computation power such as mobile phones, tablets and embedded devices like Internet-of-Things (IoT) devices. Consequently, the privacy and security of the malware infected systems and devices have been heavily jeopardiz… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  13. arXiv:2306.13149  [pdf, other

    cs.HC cs.LG

    "Filling the Blanks'': Identifying Micro-activities that Compose Complex Human Activities of Daily Living

    Authors: Soumyajit Chatterjee, Bivas Mitra, Sandip Chakraborty

    Abstract: Complex activities of daily living (ADLs) often consist of multiple micro-activities. When performed sequentially, these micro-activities help the user accomplish the broad macro-activity. Naturally, a deeper understanding of these micro-activities can help develop more sophisticated human activity recognition (HAR) models and add explainability to their inferred conclusions. Previous research has… ▽ More

    Submitted 7 February, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

    Comments: 23 pages, 4 tables, 7 figures

  14. Patterns of gender-specializing query reformulation

    Authors: Amifa Raj, Bhaskar Mitra, Nick Craswell, Michael D. Ekstrand

    Abstract: Users of search systems often reformulate their queries by adding query terms to reflect their evolving information need or to more precisely express their information need when the system fails to surface relevant content. Analyzing these query reformulations can inform us about both system and user behavior. In this work, we study a special category of query reformulations that involve specifyin… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Journal ref: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23) [2023]

  15. arXiv:2302.11370  [pdf, other

    cs.IR

    Recall, Robustness, and Lexicographic Evaluation

    Authors: Fernando Diaz, Bhaskar Mitra

    Abstract: Although originally developed to evaluate sets of items, recall is often used to evaluate rankings of items, including those produced by recommender, retrieval, and other machine learning systems. The application of recall without a formal evaluative motivation has led to criticism of recall as a vague or inappropriate measure. In light of this debate, we reflect on the measurement of recall in ra… ▽ More

    Submitted 8 March, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: Under review

  16. arXiv:2301.05277  [pdf, ps, other

    cs.HC

    DriCon: On-device Just-in-Time Context Characterization for Unexpected Driving Events

    Authors: Debasree Das, Sandip Chakraborty, Bivas Mitra

    Abstract: Driving is a complex task carried out under the influence of diverse spatial objects and their temporal interactions. Therefore, a sudden fluctuation in driving behavior can be due to either a lack of driving skill or the effect of various on-road spatial factors such as pedestrian movements, peer vehicles' actions, etc. Therefore, understanding the context behind a degraded driving behavior just-… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

  17. arXiv:2301.05046  [pdf, other

    cs.IR

    Taking Search to Task

    Authors: Chirag Shah, Ryen W. White, Paul Thomas, Bhaskar Mitra, Shawon Sarkar, Nicholas Belkin

    Abstract: The importance of tasks in information retrieval (IR) has been long argued for, addressed in different ways, often ignored, and frequently revisited. For decades, scholars made a case for the role that a user's task plays in how and why that user engages in search and what a search system should do to assist. But for the most part, the IR community has been too focused on query processing and assu… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

  18. arXiv:2212.14464  [pdf, other

    cs.IR

    Result Diversification in Search and Recommendation: A Survey

    Authors: Haolun Wu, Yansen Zhang, Chen Ma, Fuyuan Lyu, Bowei He, Bhaskar Mitra, Xue Liu

    Abstract: Diversifying return results is an important research topic in retrieval systems in order to satisfy both the various interests of customers and the equal market exposure of providers. There has been growing attention on diversity-aware research during recent years, accompanied by a proliferation of literature on methods to promote diversity in search and recommendation. However, diversity-aware st… ▽ More

    Submitted 18 February, 2024; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: 20 pages

  19. arXiv:2209.03819  [pdf, other

    cs.HC cs.AI cs.IR

    Ethical and Social Considerations in Automatic Expert Identification and People Recommendation in Organizational Knowledge Management Systems

    Authors: Ida Larsen-Ledet, Bhaskar Mitra, Siân Lindley

    Abstract: Organizational knowledge bases are moving from passive archives to active entities in the flow of people's work. We are seeing machine learning used to enable systems that both collect and surface information as people are working, making it possible to bring out connections between people and content that were previously much less visible in order to automatically identify and highlight experts o… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  20. arXiv:2206.12993  [pdf, other

    cs.IR cs.CL

    Are We There Yet? A Decision Framework for Replacing Term Based Retrieval with Dense Retrieval Systems

    Authors: Sebastian Hofstätter, Nick Craswell, Bhaskar Mitra, Hamed Zamani, Allan Hanbury

    Abstract: Recently, several dense retrieval (DR) models have demonstrated competitive performance to term-based retrieval that are ubiquitous in search systems. In contrast to term-based matching, DR projects queries and documents into a dense vector space and retrieves results via (approximate) nearest neighbor search. Deploying a new system, such as DR, inevitably involves tradeoffs in aspects of its perf… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

  21. arXiv:2205.00048  [pdf, other

    cs.IR cs.AI cs.LG

    Joint Multisided Exposure Fairness for Recommendation

    Authors: Haolun Wu, Bhaskar Mitra, Chen Ma, Fernando Diaz, Xue Liu

    Abstract: Prior research on exposure fairness in the context of recommender systems has focused mostly on disparities in the exposure of individual or groups of items to individual users of the system. The problem of how individual or groups of items may be systemically under or over exposed to groups of users, or even all users, has received relatively less attention. However, such systemic disparities in… ▽ More

    Submitted 29 April, 2022; originally announced May 2022.

  22. I Cannot See Students Focusing on My Presentation; Are They Following Me? Continuous Monitoring of Student Engagement through "Stungage"

    Authors: Snigdha Das, Sandip Chakraborty, Bivas Mitra

    Abstract: Monitoring students' engagement and understanding their learning pace in a virtual classroom becomes challenging in the absence of direct eye contact between the students and the instructor. Continuous monitoring of eye gaze and gaze gestures may produce inaccurate outcomes when the students are allowed to do productive multitasking, such as taking notes or browsing relevant content. This paper pr… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

  23. arXiv:2201.08721  [pdf, other

    cs.IR cs.AI cs.LG

    Less is Less: When Are Snippets Insufficient for Human vs Machine Relevance Estimation?

    Authors: Gabriella Kazai, Bhaskar Mitra, Anlei Dong, Nick Craswell, Linjun Yang

    Abstract: Traditional information retrieval (IR) ranking models process the full text of documents. Newer models based on Transformers, however, would incur a high computational cost when processing long texts, so typically use only snippets from the document instead. The model's input based on a document's URL, title, and snippet (UTS) is akin to the summaries that appear on a search engine results page (S… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

  24. arXiv:2112.06651  [pdf, other

    eess.SP cs.AI cs.HC cs.LG

    Accoustate: Auto-annotation of IMU-generated Activity Signatures under Smart Infrastructure

    Authors: Soumyajit Chatterjee, Arun Singh, Bivas Mitra, Sandip Chakraborty

    Abstract: Human activities within smart infrastructures generate a vast amount of IMU data from the wearables worn by individuals. Many existing studies rely on such sensory data for human activity recognition (HAR); however, one of the major bottlenecks is their reliance on pre-annotated or labeled data. Manual human-driven annotations are neither scalable nor efficient, whereas existing auto-annotation te… ▽ More

    Submitted 2 August, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

    Journal ref: IEEE DCOSS-IoT 2023

  25. arXiv:2111.07060  [pdf, other

    cs.CR cs.LG

    PAMMELA: Policy Administration Methodology using Machine Learning

    Authors: Varun Gumma, Barsha Mitra, Soumyadeep Dey, Pratik Shashikantbhai Patel, Sourabh Suman, Saptarshi Das

    Abstract: In recent years, Attribute-Based Access Control (ABAC) has become quite popular and effective for enforcing access control in dynamic and collaborative environments. Implementation of ABAC requires the creation of a set of attribute-based rules which cumulatively form a policy. Designing an ABAC policy ab initio demands a substantial amount of effort from the system administrator. Moreover, organi… ▽ More

    Submitted 13 November, 2021; originally announced November 2021.

    Comments: This work is under progress

  26. arXiv:2110.08353  [pdf, other

    cs.IR cs.AI cs.LG

    Revisiting Popularity and Demographic Biases in Recommender Evaluation and Effectiveness

    Authors: Nicola Neophytou, Bhaskar Mitra, Catherine Stinson

    Abstract: Recommendation algorithms are susceptible to popularity bias: a tendency to recommend popular items even when they fail to meet user needs. A related issue is that the recommendation quality can vary by demographic groups. Marginalized groups or groups that are under-represented in the training data may receive less relevant recommendations from these algorithms compared to others. In a recent stu… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

  27. arXiv:2110.07701  [pdf, other

    cs.IR cs.AI cs.LG

    Exposing Query Identification for Search Transparency

    Authors: Ruohan Li, Jianxiang Li, Bhaskar Mitra, Fernando Diaz, Asia J. Biega

    Abstract: Search systems control the exposure of ranked content to searchers. In many cases, creators value not only the exposure of their content but, moreover, an understanding of the specific searches where the content is surfaced. The problem of identifying which queries expose a given piece of content in the ranking results is an important and relatively under-explored search transparency challenge. Ex… ▽ More

    Submitted 11 April, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

  28. arXiv:2108.10944  [pdf, ps, other

    cs.HC cs.CY

    Impact of Driving Behavior on Commuter's Comfort during Cab Rides: Towards a New Perspective of Driver Rating

    Authors: Rohit Verma, Sugandh Pargal, Debasree Das, Tanusree Parbat, Sai Shankar Kambalapalli, Bivas Mitra, Sandip Chakraborty

    Abstract: Commuter comfort in cab rides affects driver rating as well as the reputation of ride-hailing firms like Uber/Lyft. Existing research has revealed that commuter comfort not only varies at a personalized level but also is perceived differently on different trips for the same commuter. Furthermore, there are several factors, including driving behavior and driving environment, affecting the perceptio… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

  29. arXiv:2105.09816  [pdf, other

    cs.IR cs.CL

    Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking

    Authors: Sebastian Hofstätter, Bhaskar Mitra, Hamed Zamani, Nick Craswell, Allan Hanbury

    Abstract: An emerging recipe for achieving state-of-the-art effectiveness in neural document re-ranking involves utilizing large pre-trained language models - e.g., BERT - to evaluate all individual passages in the document and then aggregating the outputs by pooling or additional Transformer layers. A major drawback of this approach is high query latency due to the cost of evaluating every passage in the d… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

    Comments: Accepted at SIGIR 2021 (Full Paper Track)

  30. Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models

    Authors: Daniel Cohen, Bhaskar Mitra, Oleg Lesota, Navid Rekabsaz, Carsten Eickhoff

    Abstract: In any ranking system, the retrieval model outputs a single score for a document based on its belief on how relevant it is to a given search query. While retrieval models have continued to improve with the introduction of increasingly complex architectures, few works have investigated a retrieval model's belief in the score beyond the scope of a single value. We argue that capturing the model's un… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

    Comments: ACM SIGIR preprint

  31. arXiv:2105.04021  [pdf, other

    cs.IR cs.AI cs.LG

    MS MARCO: Benchmarking Ranking Models in the Large-Data Regime

    Authors: Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin

    Abstract: Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public leaderboard such as MS MARCO, are intended to encourage research and track our progress, addressing big questions in our field. However, the goal is not simply to identify which run is "best", achieving the top score. The goal is to move the field forward by develo** new robust techniques, that work in many different setting… ▽ More

    Submitted 9 May, 2021; originally announced May 2021.

  32. arXiv:2105.02951  [pdf, other

    cs.IR

    Multi-FR: A Multi-objective Optimization Framework for Multi-stakeholder Fairness-aware Recommendation

    Authors: Haolun Wu, Chen Ma, Bhaskar Mitra, Fernando Diaz, Xue Liu

    Abstract: Nowadays, most online services are hosted on multi-stakeholder marketplaces, where consumers and producers may have different objectives. Conventional recommendation systems, however, mainly focus on maximizing consumers' satisfaction by recommending the most relevant items to each individual. This may result in unfair exposure of items, thus jeopardizing producer benefits. Additionally, they do n… ▽ More

    Submitted 9 August, 2022; v1 submitted 6 May, 2021; originally announced May 2021.

    Comments: 29 pages

  33. arXiv:2104.09399  [pdf, other

    cs.IR cs.AI cs.LG

    TREC Deep Learning Track: Reusable Test Collections in the Large Data Regime

    Authors: Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees, Ian Soboroff

    Abstract: The TREC Deep Learning (DL) Track studies ad hoc search in the large data regime, meaning that a large set of human-labeled training data is available. Results so far indicate that the best models with large data may be deep neural networks. This paper supports the reuse of the TREC DL test collections in three ways. First we describe the data sets in detail, documenting clearly and in one place s… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: text overlap with arXiv:2003.07820

  34. arXiv:2104.09393  [pdf, other

    cs.IR cs.AI cs.LG

    Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence

    Authors: Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell

    Abstract: The Transformer-Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark -- and can be considered to be an efficient (but slightly less effective) alternative to other Transformer-based architectures that employ (i) large-scale pretraining (high training cost), (ii) joint encoding of query and document (high inference cost), and (iii) larger number of Tra… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2007.10434

  35. arXiv:2102.12887  [pdf, other

    cs.IR

    Significant Improvements over the State of the Art? A Case Study of the MS MARCO Document Ranking Leaderboard

    Authors: Jimmy Lin, Daniel Campos, Nick Craswell, Bhaskar Mitra, Emine Yilmaz

    Abstract: Leaderboards are a ubiquitous part of modern research in applied machine learning. By design, they sort entries into some linear order, where the top-scoring entry is recognized as the "state of the art" (SOTA). Due to the rapid progress being made in information retrieval today, particularly with neural models, the top entry in a leaderboard is replaced with some regularity. These are touted as i… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

  36. arXiv:2102.07662  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Overview of the TREC 2020 deep learning track

    Authors: Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos

    Abstract: This is the second year of the TREC Deep Learning Track, with the goal of studying ad hoc ranking in the large training data regime. We again have a document retrieval task and a passage retrieval task, each with hundreds of thousands of human-labeled training queries. We evaluate using single-shot TREC-style evaluation, to give us a picture of which ranking methods work best when large data is av… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2003.07820

  37. arXiv:2101.07124  [pdf, ps, other

    cs.IR cs.HC

    Tip of the Tongue Known-Item Retrieval: A Case Study in Movie Identification

    Authors: Jaime Arguello, Adam Ferguson, Emery Fine, Bhaskar Mitra, Hamed Zamani, Fernando Diaz

    Abstract: While current information retrieval systems are effective for known-item retrieval where the searcher provides a precise name or identifier for the item being sought, systems tend to be much less effective for cases where the searcher is unable to express a precise name or identifier. We refer to this as tip of the tongue (TOT) known-item retrieval, named after the cognitive state of not being abl… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

  38. arXiv:2012.11685  [pdf

    cs.IR cs.AI cs.CL cs.LG

    Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval

    Authors: Bhaskar Mitra

    Abstract: Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents--or short passages--in response to keyword-based queries. Effective IR system… ▽ More

    Submitted 19 March, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: PhD thesis, Univ College London (2020)

  39. arXiv:2011.07368  [pdf, other

    cs.IR cs.AI cs.LG

    Conformer-Kernel with Query Term Independence at TREC 2020 Deep Learning Track

    Authors: Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell

    Abstract: We benchmark Conformer-Kernel models under the strict blind evaluation setting of the TREC 2020 Deep Learning track. In particular, we study the impact of incorporating: (i) Explicit term matching to complement matching based on learned representations (i.e., the "Duet principle"), (ii) query term independence (i.e., the "QTI assumption") to scale the model to the full retrieval setting, and (iii)… ▽ More

    Submitted 11 February, 2021; v1 submitted 14 November, 2020; originally announced November 2020.

  40. arXiv:2008.08180  [pdf, other

    cs.IR

    Semantic Product Search for Matching Structured Product Catalogs in E-Commerce

    Authors: Jason Ingyu Choi, Surya Kallumadi, Bhaskar Mitra, Eugene Agichtein, Faizan Javed

    Abstract: Retrieving all semantically relevant products from the product catalog is an important problem in E-commerce. Compared to web documents, product catalogs are more structured and sparse due to multi-instance fields that encode heterogeneous aspects of products (e.g. brand name and product dimensions). In this paper, we propose a new semantic product search algorithm that learns to represent and agg… ▽ More

    Submitted 18 August, 2020; originally announced August 2020.

    Comments: 4 pages

  41. arXiv:2007.10434  [pdf, other

    cs.IR cs.CL cs.LG

    Conformer-Kernel with Query Term Independence for Document Retrieval

    Authors: Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell

    Abstract: The Transformer-Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark---and can be considered to be an efficient (but slightly less effective) alternative to BERT-based ranking models. In this work, we extend the TK architecture to the full retrieval setting by incorporating the query term independence assumption. Furthermore, to reduce the memory comp… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

  42. arXiv:2006.05324  [pdf, other

    cs.IR cs.LG

    ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search

    Authors: Nick Craswell, Daniel Campos, Bhaskar Mitra, Emine Yilmaz, Bodo Billerbeck

    Abstract: Users of Web search engines reveal their information needs through queries and clicks, making click logs a useful asset for information retrieval. However, click logs have not been publicly released for academic use, because they can be too revealing of personally or commercially sensitive information. This paper describes a click data release related to the TREC Deep Learning Track document corpu… ▽ More

    Submitted 18 August, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

  43. arXiv:2006.00166  [pdf, other

    cs.IR

    Analyzing and Learning from User Interactions for Search Clarification

    Authors: Hamed Zamani, Bhaskar Mitra, Everest Chen, Gord Lueck, Fernando Diaz, Paul N. Bennett, Nick Craswell, Susan T. Dumais

    Abstract: Asking clarifying questions in response to search queries has been recognized as a useful technique for revealing the underlying intent of the query. Clarification has applications in retrieval systems with different interfaces, from the traditional web search interfaces to the limited bandwidth interfaces as in speech-only and small screen devices. Generation and evaluation of clarifying question… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    Comments: To appear in the Proceedings of SIGIR 2020

  44. arXiv:2005.04908  [pdf, other

    cs.IR

    Local Self-Attention over Long Text for Efficient Document Retrieval

    Authors: Sebastian Hofstätter, Hamed Zamani, Bhaskar Mitra, Nick Craswell, Allan Hanbury

    Abstract: Neural networks, particularly Transformer-based architectures, have achieved significant performance improvements on several retrieval benchmarks. When the items being retrieved are documents, the time and memory cost of employing Transformers over a full sequence of document terms can be prohibitive. A popular strategy involves considering only the first n terms of the document. This can, however… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    Comments: Accepted at SIGIR 2020 (short paper)

  45. arXiv:2004.13486  [pdf, other

    cs.IR cs.CL cs.LG

    On the Reliability of Test Collections for Evaluating Systems of Different Types

    Authors: Emine Yilmaz, Nick Craswell, Bhaskar Mitra, Daniel Campos

    Abstract: As deep learning based models are increasingly being used for information retrieval (IR), a major challenge is to ensure the availability of test collections for measuring their quality. Test collections are generated based on pooling results of various retrieval systems, but until recently this did not include deep learning systems. This raises a major challenge for reusable evaluation: Since dee… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

  46. Evaluating Stochastic Rankings with Expected Exposure

    Authors: Fernando Diaz, Bhaskar Mitra, Michael D. Ekstrand, Asia J. Biega, Ben Carterette

    Abstract: We introduce the concept of \emph{expected exposure} as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. We argue that this principl… ▽ More

    Submitted 20 October, 2020; v1 submitted 27 April, 2020; originally announced April 2020.

    Comments: In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM '20). Association for Computing Machinery, New York, NY, USA

  47. arXiv:2003.07820  [pdf, other

    cs.IR cs.CL cs.LG

    Overview of the TREC 2019 deep learning track

    Authors: Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees

    Abstract: The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime. It is the first track with large human-labeled training sets, introducing two sets corresponding to two tasks, each with rigorous TREC-style blind evaluation and reusable test sets. The document retrieval task has a corpus of 3.2 million documents with 367 thousand training querie… ▽ More

    Submitted 18 March, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

  48. arXiv:1912.09910  [pdf, other

    cs.IR

    Report on the First HIPstIR Workshop on the Future of Information Retrieval

    Authors: Laura Dietz, Bhaskar Mitra, Jeremy Pickens, Hana Anber, Sandeep Avula, Asia Biega, Adrian Boteanu, Shubham Chatterjee, Jeff Dalton, Shiri Dori-Hacohen, John Foley, Henry Feild, Ben Gamari, Rosie Jones, Pallika Kanani, Sumanta Kashyapi, Widad Machmouchi, Matthew Mitsui, Steve Nole, Alexandre Tachard Passos, Jordan Ramsdell, Adam Roegiest, David Smith, Alessandro Sordoni

    Abstract: The vision of HIPstIR is that early stage information retrieval (IR) researchers get together to develop a future for non-mainstream ideas and research agendas in IR. The first iteration of this vision materialized in the form of a three day workshop in Portsmouth, New Hampshire attended by 24 researchers across academia and industry. Attendees pre-submitted one or more topics that they want to pi… ▽ More

    Submitted 20 December, 2019; originally announced December 2019.

  49. arXiv:1912.04471  [pdf, other

    cs.IR cs.CL cs.LG

    Duet at TREC 2019 Deep Learning Track

    Authors: Bhaskar Mitra, Nick Craswell

    Abstract: This report discusses three submissions based on the Duet architecture to the Deep Learning track at TREC 2019. For the document retrieval task, we adapt the Duet model to ingest a "multiple field" view of documents---we refer to the new architecture as Duet with Multiple Fields (DuetMF). A second submission combines the DuetMF model with other neural and traditional relevance estimators in a lear… ▽ More

    Submitted 9 December, 2019; originally announced December 2019.

  50. arXiv:1907.03693  [pdf, ps, other

    cs.IR cs.LG

    Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks

    Authors: Bhaskar Mitra, Corby Rosset, David Hawking, Nick Craswell, Fernando Diaz, Emine Yilmaz

    Abstract: Classical information retrieval (IR) methods, such as query likelihood and BM25, score documents independently w.r.t. each query term, and then accumulate the scores. Assuming query term independence allows precomputing term-document scores using these models---which can be combined with specialized data structures, such as inverted index, for efficient retrieval. Deep neural IR models, in contras… ▽ More

    Submitted 8 July, 2019; originally announced July 2019.