MeMemo: On-device Retrieval Augmentation for Private and Personalized Text Generation

Zijie J. Wang 0000-0003-4360-1423 [email protected] Georgia Institute of TechnologyAtlantaGeorgiaUSA  and  Duen Horng Chau 0000-0001-9824-3323 [email protected] Georgia Institute of TechnologyAtlantaGeorgiaUSA
(2024)
Abstract.

Retrieval-augmented text generation (RAG) addresses the common limitations of large language models (LLMs), such as hallucination, by retrieving information from an updatable external knowledge base. However, existing approaches often require dedicated backend servers for data storage and retrieval, thereby limiting their applicability in use cases that require strict data privacy, such as personal finance, education, and medicine. To address the pressing need for client-side dense retrieval, we introduce MeMemo, the first open-source JavaScript toolkit that adapts the state-of-the-art approximate nearest neighbor search technique HNSW to browser environments. Developed with modern and native Web technologies, such as IndexedDB and Web Workers, our toolkit leverages client-side hardware capabilities to enable researchers and developers to efficiently search through millions of high-dimensional vectors in the browser. MeMemo enables exciting new design and research opportunities, such as private and personalized content creation and interactive prototy**, as demonstrated in our example application RAG Playground. Reflecting on our work, we discuss the opportunities and challenges for on-device dense retrieval. MeMemo is available at https://github.com/poloclub/mememo.

Neural information retrieval, On-device, Large language models
journalyear: 2024copyright: rightsretainedconference: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval; July 14–18, 2024; Washington, DC, USAbooktitle: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24), July 14–18, 2024, Washington, DC, USAdoi: 10.1145/3626772.3657662isbn: 979-8-4007-0431-4/24/07ccs: Information systems Information retrievalccs: Human-centered computing Human computer interaction (HCI)ccs: Computing methodologies Machine learning
Refer to caption
Fig. 1. MeMemo is the first open-source JavaScript toolkit for in-browser dense neural retrieval. We demonstrate the capabilities of MeMemo by develo** RAG Playground that enables AI developers to prototype retrieval-augmented text generation (RAG) apps locally in their browsers. With RAG Playground, developers can (A) enter various user queries, (B) search for semantically similar documents from an in-browser vector database, and (C) augment a text prompt with retrieved documents. (D) This allows developers to rapidly test if in-browser large language models generate more reliable responses to the query.
\Description

Teaser image for MeMemo.

1. Introduction

Retrieval augmented generation (RAG) (Lewis et al., 2020) with large language models (LLMs) has gained immense popularity from both practitioners and researchers, especially in applications such as domain-specific chatbots (Semnani et al., 2023; Prince et al., 2023), code generation (Soare et al., 2022; Zhou et al., 2023), and interactive agents (Hsieh et al., 2023; Ruan et al., 2023). RAG can improve the accuracy and reliability of LLMs’ generated text (Shuster et al., 2021), by providing these models, such as GPT-4 (OpenAI, 2023) and Llama 2 (Touvron et al., 2023), with context information retrieved from an updatable and external knowledge base. Compared to other techniques, such as fine-tuning (Hu et al., 2021) and prompt tuning (Lester et al., 2021), that aim to improve LLM’s performance on new or specific domains, RAG is often favored by AI practitioners (Martineau, 2023) due to its ease of implementation, flexibility in maintenance, and superior performance (Ovadia et al., 2024).

However, current RAG systems rely on dedicated backend servers to store and retrieve external documents relevant to the user’s query. This is often achieved through nearest neighbor search using dense embedding vector representations of documents (Balaguer et al., 2024; Li et al., 2023a). The need for centralized backend servers limits the applicability of RAG in domains that prioritize data privacy, such as personal finance, education, and medicine (e.g., Chung et al., 2023; Wutschitz et al., 2023; Fuchsbauer et al., 2021; Ghodratnama and Zakershahrak, 2023). Furthermore, implementing and hosting a vector storage and dense retriever pose additional challenges for AI novices and everyday LLM users (Draxler et al., 2023; Zamfirescu-Pereira et al., 2023), thereby increasing the barrier to entry for learning and applying RAG.

To address these pressing challenges, we present MeMemo, the first JavaScript toolkit that offloads vector storage and dense retrieval to the client—empowering a broader range of audiences to leverage cutting-edge retrieval techniques to enhance their LLM experiences. Our work makes the following key contributions:

  • MeMemo, the first scalable JavaScript library that enables users to store and retrieve large vector databases directly in their browsers. Our toolkit adapts the state-of-the-art approximate nearest neighbor search Hierarchical Navigable Small World graphs (HNSW) (Malkov and Yashunin, 2020) to the Web environment. By leveraging a novel prefetching strategy and modern Web technologies, such as IndexedDB and Web Workers, MeMemo empowers users to retrieve dense vectors with both privacy and efficiency (§ 3).

  • RAG Playground, an example application of on-device dense retrieval. We demonstrate the capabilities of MeMemo by develo** RAG Playground (Fig. 1), a novel client-side tool using on-device retrieval to enable interactive learning about RAG and rapid prototy** of RAG applications (§ 2). We highlight the benefit of on-device retrieval regarding privacy, ubiquity, and interactivity. Finally, we discuss the opportunities and challenges for future research on client-side retrieval augmentation and personalized text generation (§ 5). RAG Playground is publicly accessible at https://poloclub.github.io/mememo.

  • An open-source111MeMemo code: https://github.com/poloclub/mememo implementation that lowers the barrier for researchers and developers to apply retrieval augmentation to improve text generation on the client side. We provide comprehensive documentation and an example application to help users use MeMemo to implement on-device retrieval augmentation across different Web environments. MeMemo is developed with minimal dependencies and TypeScript, a statically typed programming language, making it a maintainable and easy-to-use resource for the information retrieval community.

We hope our work will inspire the design, research, and development of on-device retrieval, enabling everyone to use text-generative models and other AI technologies more easily and privately.

2. MeMemo in Action

We present two hypothetical usage scenarios, develo** (§ 2.1) and using (§ 2.2) RAG Playground, to demonstrate how researchers and practitioners can use MeMemo to easily develop client-side applications that take advantage of on-device RAGs.

2.1. Develo** In-browser RAG Tools

Motivations. Assume an example scenario where Mei, a machine learning (ML) consultant, is currently develo** an LLM-based chatbot for a large design studio. The chatbot’s purpose is to assist new-hired designers in familiarizing themselves with the company’s internal design systems and tools. To ensure accurate and reliable responses, Mei integrates RAG into this onboarding chatbot. This integration allows the responses to be grounded by relevant documentation, design documents, and code. Initially, Mei uses Jupyter Notebooks (Kluyver et al., 2016) to prototype the chatbot through prompt engineering in Python. However, she realizes that this workflow is not ideal for collaborating with designers and introducing RAG to her clients. This is because many of the collaborators and stakeholders are not experienced in programming and setting up notebook environments. Therefore, Mei decides to develop RAG Playground (Fig. 1), a web-based no-code RAG prototy** tool. This tool will enable her collaborators, who come from diverse backgrounds, to easily access and prototype RAG features for their chatbot through their web browsers.

1import { HNSW } from ’mememo’;
2
3// Creating a new index
4const index = new HNSW({ distanceFunction: ’cosine’ });
5
6// Inserting elements, keys: string[], values: number[][]
7await index.bulkInsert(keys, values);
8
9// Find k-nearest neighbors, query: number[], k: number
10// keys: string[], distances: number[]
11const { keys, distances } = await index.query(query, k);
Code 1: Example TypeScript code that uses MeMemo to create an HNSW index and search for k-nearest neighbors.

Vector storage and retrieval with MeMemo. Mei uses MeMemo, a JavaScript library, to enable dense vector storage and retrieval directly in the browser. By installing the library with a single command npm install mememo, Mei can easily import it into her web app, regardless of her web development stack (e.g., JavaScript, TypeScript, React (Facebook, 2013), Svelte (Harris, 2016), or Lit (Google, 2015)). With just a few lines of code (Code 1), Mei can create an HNSW vector index (Malkov and Yashunin, 2020) and efficiently search through millions of embedding vectors entirely within her browser. Mei also uses MeMemo’s exportIndex() and loadIndex() functions to export an index she has created into persistent local storage or as a JSON file. This allows her collaborators to quickly load the HNSW index without the need to recreate it every time they use RAG Playground.

Smooth integration with existing Web ML technologies. Mei seamlessly integrates MeMemo with other Web ML technologies. For example, she uses IndexedDB, a client-side key-value browser storage, to store the raw documents. Using the same keys, Mei creates the HNSW index with MeMemo. Then, Mei uses FlexSearch (Wilkerling, 2019) to implement fast full-text lexical search in the browser. To enable semantic search, Mei first uses GTE-Small (Li et al., 2023b) to encode all documents into dense vectors with 384 dimensions in Python with SentenceTransformers (Reimers and Gurevych, 2019). For encoding the user’s query (Fig. 1A), Mei uses ONNX (Bai et al., 2019) and Transformer.js (Lochner, 2023) to run the same GTE-Small model in the browser. After augmenting a text prompt with retrieved documents (Fig. 1C), Mei runs the prompt with open-source LLMs, such as LLama 2 (Touvron et al., 2023) and Phi 2 (Abdin et al., 2023), in the browser through Web LLM (teamMLCLLM2023). By combining MeMemo with existing Web ML technologies, Mei quickly develops RAG Playground and shares it with her collaborators. With this tool, Mei’s team has made great progress as all stakeholders with diverse backgrounds can easily experiment with different user queries and prompts to improve their onboarding chatbot.

2.2. Prototy** with RAG Playground

Motivations. Robaire, a graduate student studying human-computer interaction, is designing an interactive visualization tool to assist researchers in brainstorming and literature review. After discovering RAG online, Robaire becomes interested in integrating it into his prototype. The objective is to allow users to input a large corpus of academic papers and use natural language queries to discover related papers and visualize the connections between them. Since Robaire has never implemented RAG before, he turns to RAG Playground to learn about the concept and prototype for his tool.

[Uncaptioned image]

Learning and experimenting with RAG. After opening RAG Playground in the browser, Robaire creates a MeMemo database (Fig. 1B) by uploading a JSON file containing the abstracts of 120k arXiv ML papers and 384-dimensional embeddings of the abstracts. Robaire then pretends to be his end-users and types in a natural language query in the User Query View, such as “how to integrate information retrieval into ML?” (Fig. 1A). In addition, he writes a simple system prompt template in the Prompt View (Fig. 1C) with placeholders {{user}} and {{context}}. After clicking the [Uncaptioned image] button, Robaire sees 10 relevant paper abstracts with their Cosine distances highlighted in the Database View (Fig. 1B). He also finds that the two placeholders in the Prompt View are replaced with the user query and relevant documents. Robaire then sees the LLM’s output in the Output View (Fig. 1D). Finding the output helpful and grounded by the documents retrieved by MeMemo, Robaire experiments with more prompts and both remote and local LLMs (e.g., GPT 4 and Llama 2 shown in the figure above) in RAG Playground and gains a better understanding of RAG. This increased understanding gives him more confidence to implement RAG in his tool.

3. MeMemo Design and Implementation

MeMemo is the first JavaScript toolkit that enables dense retrieval in the browser. To enable fast and reliable retrieval for RAG, our tool adapts the state-of-the-art approximate nearest neighbor search technique HNSW (§ 3.1). MeMemo leveraging modern and native Web technologies, such as IndexedDB and Web Workers (§ 3.2), to optimize for browser environments. To help researchers and developers adopt MeMemo, we have open-sourced it and provided detailed documentation, tutorial, and an example application (§ 3.3).

3.1. Adapting HNSW

HNSW is a state-of-the-art approximate k-nearest neighbor search technique introduced by Malkov and Yashunin. It is inspired by the greedy graph routing used in navigable small world networks (Kleinberg, 2000; Boguñá et al., 2009) and the stochastic hierarchical structure in 1D probabilistic skip list (Pugh, 1990). HNSW uses a multilayered graph structure to connect high-dimensional dense vectors. During the insertion process, each new element is assigned a layer level at random, determining its position within the graph’s multi-layered hierarchy. The insertion process involves finding the element’s closest neighbors, starting from the top layer and working downwards using a greedy search approach. When searching for the nearest neighbors of a query element, the algorithm follows a similar procedure. It starts from the top layer and uses the connections established during the insertion phase to guide its search downwards.

We use HNSW as the approximate nearest neighbor search technique in MeMemo because it is the state-of-the-art regarding construction and query efficiency (Malkov and Yashunin, 2020). Additionally, HNSW has gained immense popularity among retrieval and AI practitioners and has been integrated into popular retrieval and RAG Python toolkits such as FAISS (Douze et al., 2024), Pyserini (Lin et al., 2021), PGVector (Kane, 2021), and LangChain (Chase, 2022). Our goal with MeMemo is to seamlessly integrate into users’ existing workflows and preferences, providing a smooth and familiar experience when develo** in-browser retrieval applications.

3.2. Optimizing for the Browsers

Memory management. Memory management is one of the main challenges for develo** in-browser toolkits. Depending on the device and browser, a webpage tab might have a RAM limit as low as 256MB (Maitre, 2018). This means that without considering any other memory usage on a webpage, it can store at most 83k 384-dimensional vectors in RAM. Additionally, for security reasons, browsers do not allow access to the operating system’s file systems, so MeMemo cannot directly store data in the user’s disk. To overcome these challenges, MeMemo leverages IndexedDB (MDN, 2021), a cross-browser key-value storage that can use up to 80% of the client’s disk size (MDN, 2023a). In IndexedDB, MeMemo stores all vector values, while only kee** the keys and HNSW graphs in the RAM.

Prefetching for efficient data access. While IndexedDB addresses the memory constraints in the browser, reading or writing a large amount of data to IndexedDB with consecutive transactions is extremely slow (RxDB, 2021). Dexie.js (Fahlander, 2021) introduces techniques for fast batched read and write to IndexedDB. However, the HNSW construction process requires consecutive reads and writes of vector values, as the algorithm relies on the previously constructed index for finding good neighbors (Mendel-Gleason, 2024; Malkov and Yashunin, 2020). To address this challenge, MeMemo introduces a prefetching mechanism. When inserting multiple elements, MeMemo first uses a batched write to store all vectors in IndexedDB. During construction and search, MeMemo maintains a cache of p𝑝pitalic_p vector values in RAM. If it needs to read a vector value that is not in the cache, MeMemo prefetches p𝑝pitalic_p neighbors of that element on the current graph layer from IndexedDB to RAM. This mechanism reduces the number of IndexedDB transactions. The parameter p𝑝pitalic_p is automatically determined by the vector dimension and can be configured by users.

3.3. Open-source and Easy to Use

To help researchers and developers easily adopt MeMemo, we open source our implementation and design APIs similar to popular HNSW Python libraries (e.g., Malkov and Yashunin, 2020; Douze et al., 2024; Zhu, 2016). Users can easily configure all HNSW parameters, such as M𝑀Mitalic_M (the number of neighbors a graph node can have) and 𝑒𝑓𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑒𝑓𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛\mathit{efConstruction}italic_efConstruction (the number of nodes to search during construction). With just a few lines of code (Code 1), users can quickly implement dense retrieval in web browsers using MeMemo. We provide detailed documentation and tutorials. Additionally, we offer an open-source example application RAG Playground that demonstrates the integration of MeMemo with existing Web ML technologies (§ 2.1). RAG Playground also shows how to use MeMemo with modern Web APIs, including Web Workers (MDN, 2023c) to prevent blocking the main thread and Streams API (MDN, 2023b) for creating an HNSW index incrementally with small network-received chunk. MeMemo is published in the popular Web package repository npm Registry, and can be easily installed and used in both browser and Node.js (Dahl, 2009) environments.

4. Related Work

Retrieval-augmented text generation. There has been a long history of using information retrieval to enhance text generation, such as develo** language models through retrieval (Lavrenko and Croft, 2001), using a retrieve-and-edit framework to improve code generation (Hashimoto et al., 2018), and incorporating knowledge graphs to enhance language representation in language models (Zhang et al., 2019). The concept of RAG was popularized by Lewis et al., who introduced a model that combines a dense passage reliever and sequence-to-sequence models. More recent approaches (e.g., Neelakantan et al., 2022; Qu et al., 2021; Izacard and Grave, 2021; Cuconasu et al., 2024) use pre-trained embedding models to encode external documents as dense vectors and retrieve relevant documents using dense retrievers such as HNSW (Malkov and Yashunin, 2020), PQ (Jégou et al., 2011), and FAISS (Douze et al., 2024). MeMemo builds upon these works and extends RAG to the client side for more private and personalized text generation.

On-device retrieval and machine learning. Traditional retrieval and machine learning (ML) systems are typically deployed on remote servers, and their outputs are sent to client devices. However, there has been a recent surge of interest in deploying ML models directly on edge devices in the pursuit of private, ubiquitous, and interactive ML experiences. Tools such as TensorFlow.js (Smilkov et al., 2019), ONNX (Bai et al., 2019), MLC (MLC, 2023; Chen et al., 2018), and Core ML (Apple, 2017) have significantly reduced the barriers to running complex ML models in browsers and mobile devices. Researchers have proposed various on-device systems, including information retrieval (Kamvar et al., 2009; Lam et al., 2023), recommender systems (Gong et al., 2020; Xia et al., 2023), prediction explanation (Wang et al., 2022, 2023b; Wang and Chau, 2023), speech recognition (Macoskey et al., 2021b, a), translation (Tan et al., 2022), and writing assistants (Wang et al., 2024). Our tool contributes to the growing body of on-device ML research by introducing the first adaptation of dense retrieval to browsers.

5. Discussion and Future Work

Reflecting on our development of MeMemo, we highlight the opportunities and challenges for in-browser dense retrieval.

Opportunities. Enabling dense retrieval and RAG in browsers offers significant advantages regarding privacy, ubiquity, and interactivity. With the browser’s ubiquity, MeMemo is accessible on various devices, including laptops, mobile phones, and IoT appliances like smart refrigerators. Future research directions include:

  • Intelligent personal information management. There is a large body of research on collecting all of one’s personal information into a searchable database (e.g., Freeman and Gelernter, 1996; Cai et al., 2005; Bell, 2001; Chau et al., 2008; Kiesel et al., 2018). Researchers can leverage on-device dense storage and retrieval to design browser extensions that automatically and privately encode and store a user’s visited web pages, photos, and academic papers. These extensions can serve as an intelligent “second brain” (Forte, 2022) to help users capture and review knowledge.

  • Private and personalized content creation. If users maintain a personal vector database in browsers, content creators, such as book writers, can use on-device RAG to tailor their content privately based on readers’ preferences and reading history.

  • Interactive RAG prototy**. Future researchers can enhance the design of RAG Playground to improve interactive RAG prototy** experience, such as supporting collaborative prompt editing (Feng et al., 2023) and interactive embedding visualizations (Wang et al., 2023a).

Challenges. Due to limited computation resources in browsers, MeMemo is slower than heavily optimized libraries like HNSWLIB (Malkov and Yashunin, 2020) in terms of index creation and search. In Chrome on a 64GB RAM MacBook, it took about 94 minutes to insert 1 million 384-dimensional vectors (M𝑀Mitalic_M=5, 𝑒𝑓𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑒𝑓𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛\mathit{efConstruction}italic_efConstruction=20). However, querying this index with 1M items is still performed in real time. Future researchers can optimize in-browser dense retrieval further by implementing parallelization and smarter prefetching techniques.

Conclusions. We present MeMemo, an open-source library that enables in-browser dense retrieval using HNSW and modern Web technologies. We introduce RAG Playground, a novel client-side RAG prototy** tool to demonstrate the capabilities of MeMemo. We hope MeMemo to be an easy-to-use resource for the information retrieval and ML community, inspiring future research and development of on-device retrieval and RAG applications.

References

  • (1)
  • Abdin et al. (2023) Marah Abdin, Jyoti Aneja, Sebastien Bubeck, Caio César Teodoro Mendes, Weizhu Chen, Allie Del Giorno, Ronen Eldan, Sivakanth Gopi, Suriya Gunasekar, Mojan Javaheripi, Piero Kauffmann, Yin Tat Lee, Yuanzhi Li, Anh Nguyen, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Michael Santacroce, Harkirat Singh Behl, Adam Taumann Kalai, Xin Wang, Rachel Ward, Philipp Witte, Cyril Zhang, and Yi Zhang. 2023. Phi-2: The Surprising Power of Small Language Models. (2023). https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/
  • Apple (2017) Apple. 2017. Core ML: Integrate Machine Learning Models into Your App. https://developer.apple.com/documentation/coreml
  • Bai et al. (2019) Junjie Bai, Fang Lu, and Ke Zhang. 2019. ONNX: Open Neural Network Exchange. https://github.com/onnx/onnx
  • Balaguer et al. (2024) Angels Balaguer, Vinamra Benara, Renato Luiz de Freitas Cunha, Roberto de M. Estevão Filho, Todd Hendry, Daniel Holstein, Jennifer Marsman, Nick Mecklenburg, Sara Malvar, Leonardo O. Nunes, Rafael Padilha, Morris Sharp, Bruno Silva, Swati Sharma, Vijay Aski, and Ranveer Chandra. 2024. RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture. (2024). https://doi.org/10.48550/ARXIV.2401.08406
  • Bell (2001) Gordon Bell. 2001. A Personal Digital Store. Commun. ACM 44 (2001). https://doi.org/10.1145/357489.357513
  • Boguñá et al. (2009) Marián Boguñá, Dmitri Krioukov, and K. C. Claffy. 2009. Navigability of Complex Networks. Nature Physics 5 (2009). https://doi.org/10.1038/nphys1130
  • Cai et al. (2005) Yuhan Cai, Xin Luna Dong, Alon Halevy, **g Michelle Liu, and Jayant Madhavan. 2005. Personal Information Management with SEMEX. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. https://doi.org/10.1145/1066157.1066289
  • Chase (2022) Harrison Chase. 2022. LangChain: Building Applications with LLMs through Composability. https://github.com/langchain-ai/langchain
  • Chau et al. (2008) Duen Horng Chau, Brad Myers, and Andrew Faulring. 2008. What to Do When Search Fails: Finding Information by Association. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/1357054.1357208
  • Chen et al. (2018) Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). https://www.usenix.org/conference/osdi18/presentation/chen
  • Chung et al. (2023) Neo Christopher Chung, George Dyer, and Lennart Brocki. 2023. Challenges of Large Language Models for Mental Health Counseling. arXiv 2311.13857 (2023). http://arxiv.longhoe.net/abs/2311.13857
  • Cuconasu et al. (2024) Florin Cuconasu, Giovanni Trappolini, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, and Fabrizio Silvestri. 2024. The Power of Noise: Redefining Retrieval for RAG Systems. arXiv 2401.14887 (2024). http://arxiv.longhoe.net/abs/2401.14887
  • Dahl (2009) Ryan Dahl. 2009. Node.Js: An Open-Source, Cross-Platform JavaScript Runtime Environment. (2009). https://nodejs.org/en/
  • Douze et al. (2024) Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The Faiss Library. arXiv 2401.08281 (2024). http://arxiv.longhoe.net/abs/2401.08281
  • Draxler et al. (2023) Fiona Draxler, Daniel Buschek, Mikke Tavast, Perttu Hämäläinen, Albrecht Schmidt, Juhi Kulshrestha, and Robin Welsch. 2023. Gender, Age, and Technology Education Influence the Adoption and Appropriation of LLMs. arXiv 2310.06556 (2023). http://arxiv.longhoe.net/abs/2310.06556
  • Facebook (2013) Facebook. 2013. React: The Library for Web and Native User Interfaces. https://react.dev/
  • Fahlander (2021) David Fahlander. 2021. Dexie.Js - Minimalistic IndexedDB Wrapper. https://dexie.org/
  • Feng et al. (2023) Felicia Li Feng, Ryan Yen, Yuzhe You, Mingming Fan, Jian Zhao, and Zhicong Lu. 2023. CoPrompt: Supporting Prompt Sharing and Referring in Collaborative Natural Language Programming. arXiv 2310.09235 (2023). http://arxiv.longhoe.net/abs/2310.09235
  • Forte (2022) Tiago Forte. 2022. Building a Second Brain: A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential (first atria books hardcover edition ed.).
  • Freeman and Gelernter (1996) Eric Freeman and David Gelernter. 1996. Lifestreams: A Storage Model for Personal Data. ACM SIGMOD Record 25 (1996). https://doi.org/10.1145/381854.381893
  • Fuchsbauer et al. (2021) Georg Fuchsbauer, Riddhi Ghosal, Nathan Hauke, and Adam O’Neill. 2021. Approximate Distance-Comparison-Preserving Symmetric Encryption. Cryptology ePrint Archive, Paper 2021/1666. https://eprint.iacr.org/2021/1666
  • Ghodratnama and Zakershahrak (2023) Samira Ghodratnama and Mehrdad Zakershahrak. 2023. Adapting LLMs for Efficient, Personalized Information Retrieval: Methods and Implications. arXiv 2311.12287 (2023). http://arxiv.longhoe.net/abs/2311.12287
  • Gong et al. (2020) Yu Gong, Ziwen Jiang, Yufei Feng, Binbin Hu, Kaiqi Zhao, Qingwen Liu, and Wenwu Ou. 2020. EdgeRec: Recommender System on Edge in Mobile Taobao. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. https://doi.org/10.1145/3340531.3412700
  • Google (2015) Google. 2015. Lit: Simple Fast Web Components. https://lit.dev/
  • Harris (2016) Rich Harris. 2016. Svelte: Cybernetically Enhanced Web Apps. https://svelte.dev/
  • Hashimoto et al. (2018) Tatsunori B Hashimoto, Kelvin Guu, Yonatan Oren, and Percy S Liang. 2018. A Retrieve-and-Edit Framework for Predicting Structured Outputs. Advances in Neural Information Processing Systems 31 (2018).
  • Hsieh et al. (2023) Cheng-Yu Hsieh, Si-An Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, and Tomas Pfister. 2023. Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models. arXiv 2308.00675 (2023). http://arxiv.longhoe.net/abs/2308.00675
  • Hu et al. (2021) Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2106.09685 (2021). http://arxiv.longhoe.net/abs/2106.09685
  • Izacard and Grave (2021) Gautier Izacard and Edouard Grave. 2021. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. https://doi.org/10.18653/v1/2021.eacl-main.74
  • Jégou et al. (2011) H Jégou, M Douze, and C Schmid. 2011. Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (2011). https://doi.org/10.1109/TPAMI.2010.57
  • Kamvar et al. (2009) Maryam Kamvar, Melanie Kellar, Rajan Patel, and Ya Xu. 2009. Computers and Iphones and Mobile Phones, Oh My!: A Logs-Based Comparison of Search Users on Different Devices. In Proceedings of the 18th International Conference on World Wide Web. https://doi.org/10.1145/1526709.1526817
  • Kane (2021) Andrew Kane. 2021. Pgvector: Open-source Vector Similarity Search for Postgres. pgvector. https://github.com/pgvector/pgvector
  • Kiesel et al. (2018) Johannes Kiesel, Arjen P de Vries, Matthias Hagen, Benno Stein, and Martin Potthast. 2018. WASP: Web Archiving and Search Personalized. In DESIRES.
  • Kleinberg (2000) Jon M. Kleinberg. 2000. Navigation in a Small World. Nature 406 (2000). https://doi.org/10.1038/35022643
  • Kluyver et al. (2016) Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B Hamrick, Jason Grout, and Sylvain Corlay. 2016. Jupyter Notebooks-a Publishing Format for Reproducible Computational Workflows. 2016 (2016). https://doi.org/10.3233/978-1-61499-649-1-87
  • Lam et al. (2023) Maximilian Lam, Jeff Johnson, Wenjie Xiong, Kiwan Maeng, Udit Gupta, Yang Li, Liangzhen Lai, Ilias Leontiadis, Minsoo Rhu, Hsien-Hsin S. Lee, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks, and G. Edward Suh. 2023. GPU-based Private Information Retrieval for On-Device Machine Learning Inference. arXiv 2301.10904 (2023). http://arxiv.longhoe.net/abs/2301.10904
  • Lavrenko and Croft (2001) Victor Lavrenko and W. Bruce Croft. 2001. Relevance Based Language Models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. https://doi.org/10.1145/383952.383972
  • Lester et al. (2021) Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The Power of Scale for Parameter-Efficient Prompt Tuning. arXiv 2104.08691 (2021). http://arxiv.longhoe.net/abs/2104.08691
  • Lewis et al. (2020) Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems 33 (2020).
  • Li et al. (2023a) Chaofan Li, Zheng Liu, Shitao Xiao, Yingxia Shao, Defu Lian, and Zhao Cao. 2023a. LibVQ: A Toolkit for Optimizing Vector Quantization and Efficient Neural Retrieval. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. https://doi.org/10.1145/3539618.3591799
  • Li et al. (2023b) Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023b. Towards General Text Embeddings with Multi-stage Contrastive Learning. arXiv 2308.03281 (2023). http://arxiv.longhoe.net/abs/2308.03281
  • Lin et al. (2021) Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. https://doi.org/10.1145/3404835.3463238
  • Lochner (2023) Joshua Lochner. 2023. Transformers.Js: State-of-the-art Machine Learning for the Web. https://github.com/xenova/transformers.js
  • Macoskey et al. (2021a) Jonathan Macoskey, Grant Strimel, and Ariya Rastrow. 2021a. Learning a Neural Diff for Speech Models. In Interspeech 2021. https://www.amazon.science/publications/learning-a-neural-diff-for-speech-models
  • Macoskey et al. (2021b) Jonathan Macoskey, Grant Strimel, **ru Su, and Ariya Rastrow. 2021b. Amortized Neural Networks for Low-Latency Speech Recognition. In Interspeech 2021. https://www.amazon.science/publications/amortized-neural-networks-for-low-latency-speech-recognition
  • Maitre (2018) Ogier Maitre. 2018. Total Canvas Memory Use Exceeds the Maximum Limit (Safari 12) - Stack Overflow. https://stackoverflow.com/questions/52532614/total-canvas-memory-use-exceeds-the-maximum-limit-safari-12
  • Malkov and Yashunin (2020) Yu A. Malkov and D. A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2020). https://doi.org/10.1109/TPAMI.2018.2889473
  • Martineau (2023) Kim Martineau. 2023. What Is Retrieval-Augmented Generation? https://research.ibm.com/blog/retrieval-augmented-generation-RAG
  • MDN (2021) MDN. 2021. IndexedDB API - Web APIs. https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API
  • MDN (2023a) MDN. 2023a. Storage Quotas and Eviction Criteria - Web APIs | MDN. https://developer.mozilla.org/en-US/docs/Web/API/Storage_API/Storage_quotas_and_eviction_criteria
  • MDN (2023b) MDN. 2023b. Streams API - Web APIs. https://developer.mozilla.org/en-US/docs/Web/API/Streams_API
  • MDN (2023c) MDN. 2023c. Web Workers API - Web APIs. https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API
  • Mendel-Gleason (2024) Gavin Mendel-Gleason. 2024. Parallelising HNSW. https://github.com/GavinMendelGleason/blog/blob/main/entries/parallelising_hnsw.md
  • MLC (2023) Team MLC. 2023. MLC-LLM. https://github.com/mlc-ai/mlc-llm
  • Neelakantan et al. (2022) Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, Peter Welinder, and Lilian Weng. 2022. Text and Code Embeddings by Contrastive Pre-Training. (2022). https://doi.org/10.48550/ARXIV.2201.10005
  • OpenAI (2023) OpenAI. 2023. GPT-4 Technical Report. arXiv 2303.08774 (2023). http://arxiv.longhoe.net/abs/2303.08774
  • Ovadia et al. (2024) Oded Ovadia, Menachem Brief, Moshik Mishaeli, and Oren Elisha. 2024. Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs. arXiv 2312.05934 (2024). http://arxiv.longhoe.net/abs/2312.05934
  • Prince et al. (2023) Michael H. Prince, Henry Chan, Aikaterini Vriza, Tao Zhou, Varuni K. Sastry, Matthew T. Dearing, Ross J. Harder, Rama K. Vasudevan, and Mathew J. Cherukara. 2023. Opportunities for Retrieval and Tool Augmented Large Language Models in Scientific Facilities. arXiv 2312.01291 (2023). http://arxiv.longhoe.net/abs/2312.01291
  • Pugh (1990) William Pugh. 1990. Skip Lists: A Probabilistic Alternative to Balanced Trees. Commun. ACM 33 (1990). https://doi.org/10.1145/78973.78977
  • Qu et al. (2021) Chen Qu, Hamed Zamani, Liu Yang, W. Bruce Croft, and Erik Learned-Miller. 2021. Passage Retrieval for Outside-Knowledge Visual Question Answering. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. https://doi.org/10.1145/3404835.3462987
  • Reimers and Gurevych (2019) Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. https://arxiv.longhoe.net/abs/1908.10084
  • Ruan et al. (2023) **gqing Ruan, Yihong Chen, Bin Zhang, Zhiwei Xu, Tianpeng Bao, Guoqing Du, Shiwei Shi, Hangyu Mao, Ziyue Li, Xingyu Zeng, and Rui Zhao. 2023. TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage. arXiv 2308.03427 (2023). http://arxiv.longhoe.net/abs/2308.03427
  • RxDB (2021) RxDB. 2021. Why IndexedDB Is Slow and What to Use Instead. https://rxdb.info/slow-indexeddb.html
  • Semnani et al. (2023) Sina Semnani, Violet Yao, Heidi Zhang, and Monica Lam. 2023. WikiChat: Stop** the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia. In Findings of the Association for Computational Linguistics: EMNLP 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.157
  • Shuster et al. (2021) Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston. 2021. Retrieval Augmentation Reduces Hallucination in Conversation. arXiv 2104.07567 (2021). http://arxiv.longhoe.net/abs/2104.07567
  • Smilkov et al. (2019) Daniel Smilkov, Nikhil Thorat, Yannick Assogba, Ann Yuan, Nick Kreeger, ** Yu, Kangyi Zhang, Shanqing Cai, Eric Nielsen, David Soergel, Stan Bileschi, Michael Terry, Charles Nicholson, Sandeep N. Gupta, Sarah Sirajuddin, D. Sculley, Rajat Monga, Greg Corrado, Fernanda B. Viégas, and Martin Wattenberg. 2019. TensorFlow.Js: Machine Learning for the Web and Beyond. arXiv (2019). https://arxiv.longhoe.net/abs/1901.05350
  • Soare et al. (2022) Elena Soare, Iain Mackie, and Jeffrey Dalton. 2022. DocuT5: Seq2seq SQL Generation with Table Documentation. arXiv 2211.06193 (2022). http://arxiv.longhoe.net/abs/2211.06193
  • Tan et al. (2022) Zhixing Tan, Zeyuan Yang, Meng Zhang, Qun Liu, Maosong Sun, and Yang Liu. 2022. Dynamic Multi-Branch Layers for On-Device Neural Machine Translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022). https://doi.org/10.1109/TASLP.2022.3153257
  • Touvron et al. (2023) Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv 2307.09288 (2023). https://arxiv.longhoe.net/abs/2307.09288
  • Wang et al. (2024) Zijie J. Wang, Aishwarya Chakravarthy, David Munechika, and Duen Horng Chau. 2024. Wordflow: Social Prompt Engineering for Large Language Models. arXiv 2401.14447 (2024). http://arxiv.longhoe.net/abs/2401.14447
  • Wang and Chau (2023) Zijie J. Wang and Duen Horng Chau. 2023. WebSHAP: Towards Explaining Any Machine Learning Models Anywhere. In Companion Proceedings of the Web Conference 2023. https://doi.org/10.1145/3543873.3587362
  • Wang et al. (2023a) Zijie J. Wang, Fred Hohman, and Duen Horng Chau. 2023a. WizMap: Scalable Interactive Visualization for Exploring Large Machine Learning Embeddings. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). https://aclanthology.org/2023.acl-demo.50
  • Wang et al. (2022) Zijie J. Wang, Alex Kale, Harsha Nori, Peter Stella, Mark E. Nunnally, Duen Horng Chau, Mihaela Vorvoreanu, Jennifer Wortman Vaughan, and Rich Caruana. 2022. Interpretability, Then What? Editing Machine Learning Models to Reflect Human Knowledge and Values. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22). https://doi.org/10.1145/3534678.3539074
  • Wang et al. (2023b) Zijie J. Wang, Jennifer Wortman Vaughan, Rich Caruana, and Duen Horng Chau. 2023b. GAM Coach: Towards Interactive and User-centered Algorithmic Recourse. In CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3544548.3580816
  • Wilkerling (2019) Thomas Wilkerling. 2019. FlexSearch: Next-Generation Full Text Search Library for Browser and Node.Js. https://github.com/nextapps-de/flexsearch
  • Wutschitz et al. (2023) Lukas Wutschitz, Boris Köpf, Andrew Paverd, Saravan Rajmohan, Ahmed Salem, Shruti Tople, Santiago Zanella-Béguelin, Menglin Xia, and Victor Rühle. 2023. Rethinking Privacy in Machine Learning Pipelines from an Information Flow Control Perspective. arXiv 2311.15792 (2023). http://arxiv.longhoe.net/abs/2311.15792
  • Xia et al. (2023) Xin Xia, Junliang Yu, Qinyong Wang, Chaoqun Yang, Nguyen Quoc Viet Hung, and Hongzhi Yin. 2023. Efficient On-Device Session-Based Recommendation. ACM Transactions on Information Systems (2023). https://doi.org/10.1145/3580364
  • Zamfirescu-Pereira et al. (2023) J.D. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3544548.3581388
  • Zhang et al. (2019) Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced Language Representation with Informative Entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1139
  • Zhou et al. (2023) Shuyan Zhou, Uri Alon, Frank F. Xu, Zhiruo Wang, Zhengbao Jiang, and Graham Neubig. 2023. DocPrompting: Generating Code by Retrieving the Docs. arXiv 2207.05987 (2023). http://arxiv.longhoe.net/abs/2207.05987
  • Zhu (2016) Eric Zhu. 2016. Ekzhu/Datasketch: MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW. https://github.com/ekzhu/datasketch