Search | arXiv e-print repository

Selective area epitaxy of in-plane HgTe nanostrcutures on CdTe(001) substrate

Authors: Nicolas Chaize, Xavier Baudry, Pierre-Henri Jouneau, Eric Gautier, Jean-Luc Rouvière, Yves Deblock, Jimmy Xu, Maxime Berthe, Clément Barbot, Bruno Grandidier, Ludovic Desplanque, Hermann Sellier, Philippe Ballet

Abstract: Semiconductor nanowires are believed to play a crucial role for future applications in electronics, spintronics and quantum technologies. A potential candidate is HgTe but its sensitivity to nanofabrication processes restrain its development. A way to circumvent this obstacle is the selective area growth technique. Here, in-plane HgTe nanostructures are grown thanks to selective area molecular bea… ▽ More Semiconductor nanowires are believed to play a crucial role for future applications in electronics, spintronics and quantum technologies. A potential candidate is HgTe but its sensitivity to nanofabrication processes restrain its development. A way to circumvent this obstacle is the selective area growth technique. Here, in-plane HgTe nanostructures are grown thanks to selective area molecular beam epitaxy on a semi-insulating CdTe substrate covered with a patterned SiO$_{\mathrm{2}}$ mask. The shape of these nanostructures is defined by the in-plane orientation of the mask aperture along the <$110$>, <$1\bar{\mathrm{1}}0$>, or <$100$> direction, the deposited thickness, and the growth temperature. Several micron long in-plane nanowires can be achieved as well as more complex nanostructures such as networks, diamond structures or rings. A good selectivity is achieved with very little parasitic growth on the mask even for a growth temperature as low as $140$°C and growth rate up to $0.5$ ML/s. For <$110$> oriented nanowires, the center of the nanostructure exhibits a trapezoidal shape with {$111$}B facets and two grains on the sides, while <$1\bar{\mathrm{1}}0$> oriented nanowires show {$111$}A facets with adatoms accumulation on the sides of the top surface. Transmission electron microscopy observations reveal a continuous epitaxial relation between the CdTe substrate and the HgTe nanowire. Measurements of the resistance with fourpoint scanning tunneling microscopy indicates a good electrical homogeneity along the main NW axis and a thermally activated transport. This growth method paves the way toward the fabrication of complex HgTe-based nanostructures for electronic transport measurements. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 18 pages and 8 figures. Submitted to Nanotechnology

arXiv:2407.07790 [pdf, other]

doi 10.1145/3626772.3657861

Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR

Authors: Nandan Thakur, Luiz Bonifacio, Maik Fröbe, Alexander Bondarenko, Ehsan Kamalloo, Martin Potthast, Matthias Hagen, Jimmy Lin

Abstract: The zero-shot effectiveness of neural retrieval models is often evaluated on the BEIR benchmark -- a combination of different IR evaluation datasets. Interestingly, previous studies found that particularly on the BEIR subset Touché 2020, an argument retrieval task, neural retrieval models are considerably less effective than BM25. Still, so far, no further investigation has been conducted on what… ▽ More The zero-shot effectiveness of neural retrieval models is often evaluated on the BEIR benchmark -- a combination of different IR evaluation datasets. Interestingly, previous studies found that particularly on the BEIR subset Touché 2020, an argument retrieval task, neural retrieval models are considerably less effective than BM25. Still, so far, no further investigation has been conducted on what makes argument retrieval so "special". To more deeply analyze the respective potential limits of neural retrieval models, we run a reproducibility study on the Touché 2020 data. In our study, we focus on two experiments: (i) a black-box evaluation (i.e., no model retraining), incorporating a theoretical exploration using retrieval axioms, and (ii) a data denoising evaluation involving post-hoc relevance judgments. Our black-box evaluation reveals an inherent bias of neural models towards retrieving short passages from the Touché 2020 data, and we also find that quite a few of the neural models' results are unjudged in the Touché 2020 data. As many of the short Touché passages are not argumentative and thus non-relevant per se, and as the missing judgments complicate fair comparison, we denoise the Touché 2020 data by excluding very short passages (less than 20 words) and by augmenting the unjudged data with post-hoc judgments following the Touché guidelines. On the denoised data, the effectiveness of the neural models improves by up to 0.52 in nDCG@10, but BM25 is still more effective. Our code and the augmented Touché 2020 dataset are available at \url{https://github.com/castorini/touche-error-analysis}. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: SIGIR 2024 (Resource & Reproducibility Track)

arXiv:2407.07279 [pdf, other]

Towards a theory of learning dynamics in deep state space models

Authors: Jakub Smékal, Jimmy T. H. Smith, Michael Kleinman, Dan Biderman, Scott W. Linderman

Abstract: State space models (SSMs) have shown remarkable empirical performance on many long sequence modeling tasks, but a theoretical understanding of these models is still lacking. In this work, we study the learning dynamics of linear SSMs to understand how covariance structure in data, latent state size, and initialization affect the evolution of parameters throughout learning with gradient descent. We… ▽ More State space models (SSMs) have shown remarkable empirical performance on many long sequence modeling tasks, but a theoretical understanding of these models is still lacking. In this work, we study the learning dynamics of linear SSMs to understand how covariance structure in data, latent state size, and initialization affect the evolution of parameters throughout learning with gradient descent. We show that focusing on the learning dynamics in the frequency domain affords analytical solutions under mild assumptions, and we establish a link between one-dimensional SSMs and the dynamics of deep linear feed-forward networks. Finally, we analyze how latent state over-parameterization affects convergence time and describe future work in extending our results to the study of deep SSMs with nonlinear connections. This work is a step toward a theory of learning dynamics in deep state space models. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.04684 [pdf, other]

Investigating the Mass of the Black Hole and Possible Wind Outflow of the Accretion Disk in the Tidal Disruption Event AT2021ehb

Authors: Xin Xiang, Jon M. Miller, Abderahmen Zoghbi, Mark T. Reynolds, David Bogensberger, Lixin Dai, Paul A. Draghis, Jeremy J. Drake, Olivier Godet, Jimmy A. Irwin, Michael C. Miller, Brenna E. Mockler, Richard Saxton, Natalie Webb

Abstract: Tidal disruption events (TDEs) can potentially probe low-mass black holes in host galaxies that might not adhere to bulge or stellar-dispersion relationships. At least initially, TDEs can also reveal super-Eddington accretion. X-ray spectroscopy can potentially constrain black hole masses, and reveal ionized outflows associated with super-Eddington accretion. Our analysis of XMM-Newton X-ray obser… ▽ More Tidal disruption events (TDEs) can potentially probe low-mass black holes in host galaxies that might not adhere to bulge or stellar-dispersion relationships. At least initially, TDEs can also reveal super-Eddington accretion. X-ray spectroscopy can potentially constrain black hole masses, and reveal ionized outflows associated with super-Eddington accretion. Our analysis of XMM-Newton X-ray observations of the TDE AT2021ehb, around 300 days post-disruption, reveals a soft spectrum and can be fit with a combination of multi-color disk blackbody and power-law components. Using two independent disk models with properties suited to TDEs, we estimate a black hole mass at $M \simeq 10^{5.5}~M_{\odot}$, indicating AT2021ehb may expose the elusive low-mass end of the nuclear black hole population. These models offer simple yet robust characterization; more complicated models are not required, but provide important context and caveats in the limit of moderately sensitive data. If disk reflection is included, the disk flux is lower and inferred black hole masses are $\sim$ 0.35 dex higher. Simple wind formulations imply an extremely fast $v_{\mathrm{out}} = -0.2~c$ outflow and obviate a disk continuum component. Assuming a unity filling factor, such a wind implies an instantaneous mass outflow rate of $\dot{M} \simeq 5~M_{\odot}~{\rm yr}^{-1}$. Such a high rate suggests that the filling factor for the Ultra Fast Outflow (UFO) must be extremely low, and/or the UFO phase is ephemeral. We discuss the strengths and limitations of our analysis and avenues for future observations of TDEs. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 19 pages, 4 figures

arXiv:2407.04069 [pdf, other]

A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations

Authors: Md Tahmid Rahman Laskar, Sawsan Alqahtani, M Saiful Bari, Mizanur Rahman, Mohammad Abdullah Matin Khan, Haidar Khan, Israt Jahan, Amran Bhuiyan, Chee Wei Tan, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty, Jimmy Huang

Abstract: Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the comple… ▽ More Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the complexity of the evaluation process has led to varied evaluation setups, causing inconsistencies in findings and interpretations. To address this, we systematically review the primary challenges and limitations causing these inconsistencies and unreliable evaluations in various steps of LLM evaluation. Based on our critical review, we present our perspectives and recommendations to ensure LLM evaluations are reproducible, reliable, and robust. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.18762 [pdf, other]

Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism

Authors: Shi Zong, Jimmy Lin

Abstract: There have been a huge number of benchmarks proposed to evaluate how large language models (LLMs) behave for logic inference tasks. However, it remains an open question how to properly evaluate this ability. In this paper, we provide a systematic overview of prior works on the logical reasoning ability of LLMs for analyzing categorical syllogisms. We first investigate all the possible variations f… ▽ More There have been a huge number of benchmarks proposed to evaluate how large language models (LLMs) behave for logic inference tasks. However, it remains an open question how to properly evaluate this ability. In this paper, we provide a systematic overview of prior works on the logical reasoning ability of LLMs for analyzing categorical syllogisms. We first investigate all the possible variations for the categorical syllogisms from a purely logical perspective and then examine the underlying configurations (i.e., mood and figure) tested by the existing datasets. Our results indicate that compared to template-based synthetic datasets, crowdsourcing approaches normally sacrifice the coverage of configurations (i.e., mood and figure) of categorical syllogisms for more language variations, thus bringing challenges to fully testing LLMs under different situations. We then proceed to summarize the findings and observations for the performances of LLMs to infer the validity of syllogisms from the current literature. The error rate breakdown analyses suggest that the interpretation of the quantifiers seems to be the current bottleneck that limits the performances of the LLMs and is thus worth more attention. Finally, we discuss several points that might be worth considering when researchers plan on the future release of categorical syllogism datasets. We hope our work will not only provide a timely review of the current literature regarding categorical syllogisms, but also motivate more interdisciplinary research between communities, specifically computational linguists and logicians. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17216 [pdf, other]

Machine Unlearning Fails to Remove Data Poisoning Attacks

Authors: Martin Pawelczyk, Jimmy Z. Di, Yiwei Lu, Gautam Kamath, Ayush Sekhari, Seth Neel

Abstract: We revisit the efficacy of several practical methods for approximate machine unlearning developed for large-scale deep learning. In addition to complying with data deletion requests, one often-cited potential application for unlearning methods is to remove the effects of training on poisoned data. We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be ef… ▽ More We revisit the efficacy of several practical methods for approximate machine unlearning developed for large-scale deep learning. In addition to complying with data deletion requests, one often-cited potential application for unlearning methods is to remove the effects of training on poisoned data. We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be effective in a number of evaluation settings (e.g., alleviating membership inference attacks), they fail to remove the effects of data poisoning, across a variety of types of poisoning attacks (indiscriminate, targeted, and a newly-introduced Gaussian poisoning attack) and models (image classifiers and LLMs); even when granted a relatively large compute budget. In order to precisely characterize unlearning efficacy, we introduce new evaluation metrics for unlearning based on data poisoning. Our results suggest that a broader perspective, including a wider variety of evaluations, is required to avoid a false sense of confidence in machine unlearning procedures for deep learning without provable guarantees. Moreover, while unlearning methods show some signs of being useful to efficiently remove poisoned datapoints without having to retrain, our work suggests that these methods are not yet "ready for prime time", and currently provide limited benefit over retraining. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16828 [pdf, other]

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

Authors: Ronak Pradeep, Nandan Thakur, Sahel Sharifymoghaddam, Eric Zhang, Ryan Nguyen, Daniel Campos, Nick Craswell, Jimmy Lin

Abstract: Did you try out the new Bing Search? Or maybe you fiddled around with Google AI~Overviews? These might sound familiar because the modern-day search stack has recently evolved to include retrieval-augmented generation (RAG) systems. They allow searching and incorporating real-time data into large language models (LLMs) to provide a well-informed, attributed, concise summary in contrast to the tradi… ▽ More Did you try out the new Bing Search? Or maybe you fiddled around with Google AI~Overviews? These might sound familiar because the modern-day search stack has recently evolved to include retrieval-augmented generation (RAG) systems. They allow searching and incorporating real-time data into large language models (LLMs) to provide a well-informed, attributed, concise summary in contrast to the traditional search paradigm that relies on displaying a ranked list of documents. Therefore, given these recent advancements, it is crucial to have an arena to build, test, visualize, and systematically evaluate RAG-based search systems. With this in mind, we propose the TREC 2024 RAG Track to foster innovation in evaluating RAG systems. In our work, we lay out the steps we've made towards making this track a reality -- we describe the details of our reusable framework, Ragnarök, explain the curation of the new MS MARCO V2.1 collection choice, release the development topics for the track, and standardize the I/O definitions which assist the end user. Next, using Ragnarök, we identify and provide key industrial baselines such as OpenAI's GPT-4o or Cohere's Command R+. Further, we introduce a web-based user interface for an interactive arena allowing benchmarking pairwise RAG systems by crowdsourcing. We open-source our Ragnarök framework and baselines to achieve a unified standard for future RAG systems. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16671 [pdf, other]

STAR: Swarm Technology for Aerial Robotics Research

Authors: Jimmy Chiun, Yan Rui Tan, Yuhong Cao, John Tan, Guillaume Sartoretti

Abstract: In recent years, the field of aerial robotics has witnessed significant progress, finding applications in diverse domains, including post-disaster search and rescue operations. Despite these strides, the prohibitive acquisition costs associated with deploying physical multi-UAV systems have posed challenges, impeding their widespread utilization in research endeavors. To overcome these challenges,… ▽ More In recent years, the field of aerial robotics has witnessed significant progress, finding applications in diverse domains, including post-disaster search and rescue operations. Despite these strides, the prohibitive acquisition costs associated with deploying physical multi-UAV systems have posed challenges, impeding their widespread utilization in research endeavors. To overcome these challenges, we present STAR (Swarm Technology for Aerial Robotics Research), a framework developed explicitly to improve the accessibility of aerial swarm research experiments. Our framework introduces a swarm architecture based on the Crazyflie, a low-cost, open-source, palm-sized aerial platform, well suited for experimental swarm algorithms. To augment cost-effectiveness and mitigate the limitations of employing low-cost robots in experiments, we propose a landmark-based localization module leveraging fiducial markers. This module, also serving as a target detection module, enhances the adaptability and versatility of the framework. Additionally, collision and obstacle avoidance are implemented through velocity obstacles. The presented work strives to bridge the gap between theoretical advances and tangible implementations, thus fostering progress in the field. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.11704 [pdf, other]

Nemotron-4 340B Technical Report

Authors: Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek , et al. (58 additional authors not shown)

Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be… ▽ More We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11251 [pdf, other]

Unifying Multimodal Retrieval via Document Screenshot Embedding

Authors: Xueguang Ma, Sheng-Chieh Lin, Minghan Li, Wenhu Chen, Jimmy Lin

Abstract: In the real world, documents are organized in different formats and varied modalities. Traditional retrieval pipelines require tailored document parsing techniques and content extraction modules to prepare input for indexing. This process is tedious, prone to errors, and has information loss. To this end, we propose Document Screenshot Embedding} (DSE), a novel retrieval paradigm that regards docu… ▽ More In the real world, documents are organized in different formats and varied modalities. Traditional retrieval pipelines require tailored document parsing techniques and content extraction modules to prepare input for indexing. This process is tedious, prone to errors, and has information loss. To this end, we propose Document Screenshot Embedding} (DSE), a novel retrieval paradigm that regards document screenshots as a unified input format, which does not require any content extraction preprocess and preserves all the information in a document (e.g., text, image and layout). DSE leverages a large vision-language model to directly encode document screenshots into dense representations for retrieval. To evaluate our method, we first craft the dataset of Wiki-SS, a 1.3M Wikipedia web page screenshots as the corpus to answer the questions from the Natural Questions dataset. In such a text-intensive document retrieval setting, DSE shows competitive effectiveness compared to other text retrieval methods relying on parsing. For example, DSE outperforms BM25 by 17 points in top-1 retrieval accuracy. Additionally, in a mixed-modality task of slide retrieval, DSE significantly outperforms OCR text retrieval methods by over 15 points in nDCG@10. These experiments show that DSE is an effective document retrieval paradigm for diverse types of documents. Model checkpoints, code, and Wiki-SS collection will be released. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10393 [pdf, other]

EWEK-QA: Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems

Authors: Mohammad Dehghan, Mohammad Ali Alomrani, Sunyam Bagga, David Alfonso-Hermelo, Khalil Bibi, Abbas Ghaddar, Yingxue Zhang, Xiaoguang Li, Jianye Hao, Qun Liu, Jimmy Lin, Boxing Chen, Prasanna Parthasarathi, Mahdi Biparva, Mehdi Rezagholizadeh

Abstract: The emerging citation-based QA systems are gaining more attention especially in generative AI search applications. The importance of extracted knowledge provided to these systems is vital from both accuracy (completeness of information) and efficiency (extracting the information in a timely manner). In this regard, citation-based QA systems are suffering from two shortcomings. First, they usually… ▽ More The emerging citation-based QA systems are gaining more attention especially in generative AI search applications. The importance of extracted knowledge provided to these systems is vital from both accuracy (completeness of information) and efficiency (extracting the information in a timely manner). In this regard, citation-based QA systems are suffering from two shortcomings. First, they usually rely only on web as a source of extracted knowledge and adding other external knowledge sources can hamper the efficiency of the system. Second, web-retrieved contents are usually obtained by some simple heuristics such as fixed length or breakpoints which might lead to splitting information into pieces. To mitigate these issues, we propose our enhanced web and efficient knowledge graph (KG) retrieval solution (EWEK-QA) to enrich the content of the extracted knowledge fed to the system. This has been done through designing an adaptive web retriever and incorporating KGs triples in an efficient manner. We demonstrate the effectiveness of EWEK-QA over the open-source state-of-the-art (SoTA) web-based and KG baseline models using a comprehensive set of quantitative and human evaluation experiments. Our model is able to: first, improve the web-retriever baseline in terms of extracting more relevant passages (>20\%), the coverage of answer span (>25\%) and self containment (>35\%); second, obtain and integrate KG triples into its pipeline very efficiently (by avoiding any LLM calls) to outperform the web-only and KG-only SoTA baselines significantly in 7 quantitative QA tasks and our human evaluation. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09355 [pdf, other]

Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models

Authors: Manveer Singh Tamber, Jasper Xian, Jimmy Lin

Abstract: Embedding models that generate representation vectors from natural language text are widely used, reflect substantial investments, and carry significant commercial value. Companies such as OpenAI and Cohere have developed competing embedding models accessed through APIs that require users to pay for usage. In this architecture, the models are "hidden" behind APIs, but this does not mean that they… ▽ More Embedding models that generate representation vectors from natural language text are widely used, reflect substantial investments, and carry significant commercial value. Companies such as OpenAI and Cohere have developed competing embedding models accessed through APIs that require users to pay for usage. In this architecture, the models are "hidden" behind APIs, but this does not mean that they are "well guarded". We present, to our knowledge, the first effort to "steal" these models for retrieval by training local models on text-embedding pairs obtained from the commercial APIs. Our experiments show using standard benchmarks that it is possible to efficiently replicate the retrieval effectiveness of the commercial embedding models using an attack that costs only around $200 to train (presumably) smaller models with fewer dimensions. Our findings raise important considerations for deploying commercial embedding models and suggest measures to mitigate the risk of model theft. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08673 [pdf, ps, other]

HelpSteer2: Open-source dataset for training top-performing reward models

Authors: Zhilin Wang, Yi Dong, Olivier Delalleau, Jiaqi Zeng, Gerald Shen, Daniel Egert, Jimmy J. Zhang, Makesh Narsimhan Sreedhar, Oleksii Kuchaiev

Abstract: High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences. As LLMs become stronger and better aligned, permissively licensed preference datasets, such as Open Assistant, HH-RLHF, and HelpSteer need to be updated to remain effective for reward modeling. Methods… ▽ More High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences. As LLMs become stronger and better aligned, permissively licensed preference datasets, such as Open Assistant, HH-RLHF, and HelpSteer need to be updated to remain effective for reward modeling. Methods that distil preference data from proprietary LLMs such as GPT-4 have restrictions on commercial usage imposed by model providers. To improve upon both generated responses and attribute labeling quality, we release HelpSteer2, a permissively licensed preference dataset (CC-BY-4.0). Using a powerful internal base model trained on HelpSteer2, we are able to achieve the SOTA score (92.0%) on Reward-Bench's primary dataset, outperforming currently listed open and proprietary models, as of June 12th, 2024. Notably, HelpSteer2 consists of only ten thousand response pairs, an order of magnitude fewer than existing preference datasets (e.g., HH-RLHF), which makes it highly efficient for training reward models. Our extensive experiments demonstrate that reward models trained with HelpSteer2 are effective in aligning LLMs. In particular, we propose SteerLM 2.0, a model alignment approach that can effectively make use of the rich multi-attribute score predicted by our reward models. HelpSteer2 is available at https://huggingface.co/datasets/nvidia/HelpSteer2 and code is available at https://github.com/NVIDIA/NeMo-Aligner △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08482 [pdf, other]

Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation

Authors: Raphael Tang, Xinyu Zhang, Lixinyu Xu, Yao Lu, Wenyan Li, Pontus Stenetorp, Jimmy Lin, Ferhan Ture

Abstract: Diffusion models are the state of the art in text-to-image generation, but their perceptual variability remains understudied. In this paper, we examine how prompts affect image variability in black-box diffusion-based models. We propose W1KP, a human-calibrated measure of variability in a set of images, bootstrapped from existing image-pair perceptual distances. Current datasets do not cover recen… ▽ More Diffusion models are the state of the art in text-to-image generation, but their perceptual variability remains understudied. In this paper, we examine how prompts affect image variability in black-box diffusion-based models. We propose W1KP, a human-calibrated measure of variability in a set of images, bootstrapped from existing image-pair perceptual distances. Current datasets do not cover recent diffusion models, thus we curate three test sets for evaluation. Our best perceptual distance outperforms nine baselines by up to 18 points in accuracy, and our calibration matches graded human judgements 78% of the time. Using W1KP, we study prompt reusability and show that Imagen prompts can be reused for 10-50 random seeds before new images become too similar to already generated images, while Stable Diffusion XL and DALL-E 3 can be reused 50-200 times. Lastly, we analyze 56 linguistic features of real prompts, finding that the prompt's length, CLIP embedding norm, concreteness, and word senses influence variability most. As far as we are aware, we are the first to analyze diffusion variability from a visuolinguistic perspective. Our project page is at http://w1kp.com △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 13 pages, 11 figures

arXiv:2406.06519 [pdf, other]

UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor

Authors: Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Nick Craswell, Jimmy Lin

Abstract: Copious amounts of relevance judgments are necessary for the effective training and accurate evaluation of retrieval systems. Conventionally, these judgments are made by human assessors, rendering this process expensive and laborious. A recent study by Thomas et al. from Microsoft Bing suggested that large language models (LLMs) can accurately perform the relevance assessment task and provide huma… ▽ More Copious amounts of relevance judgments are necessary for the effective training and accurate evaluation of retrieval systems. Conventionally, these judgments are made by human assessors, rendering this process expensive and laborious. A recent study by Thomas et al. from Microsoft Bing suggested that large language models (LLMs) can accurately perform the relevance assessment task and provide human-quality judgments, but unfortunately their study did not yield any reusable software artifacts. Our work presents UMBRELA (a recursive acronym that stands for UMbrela is the Bing RELevance Assessor), an open-source toolkit that reproduces the results of Thomas et al. using OpenAI's GPT-4o model and adds more nuance to the original paper. Across Deep Learning Tracks from TREC 2019 to 2023, we find that LLM-derived relevance judgments correlate highly with rankings generated by effective multi-stage retrieval systems. Our toolkit is designed to be easily extensible and can be integrated into existing multi-stage retrieval and evaluation pipelines, offering researchers a valuable resource for studying retrieval evaluation methodologies. UMBRELA will be used in the TREC 2024 RAG Track to aid in relevance assessments, and we envision our toolkit becoming a foundation for further innovation in the field. UMBRELA is available at https://github.com/castorini/umbrela. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 figures

arXiv:2406.05364 [pdf, other]

Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models

Authors: Kalyan Nakka, Jimmy Dani, Nitesh Saxena

Abstract: In this paper, we present a very first study to investigate trust and ethical implications of on-device artificial intelligence (AI), focusing on ''small'' language models (SLMs) amenable for personal devices like smartphones. While on-device SLMs promise enhanced privacy, reduced latency, and improved user experience compared to cloud-based services, we posit that they might also introduce signif… ▽ More In this paper, we present a very first study to investigate trust and ethical implications of on-device artificial intelligence (AI), focusing on ''small'' language models (SLMs) amenable for personal devices like smartphones. While on-device SLMs promise enhanced privacy, reduced latency, and improved user experience compared to cloud-based services, we posit that they might also introduce significant challenges and vulnerabilities compared to on-server counterparts. As part of our trust assessment study, we conduct a systematic evaluation of the state-of-the-art on-devices SLMs, contrasted to their on-server counterparts, based on a well-established trustworthiness measurement framework. Our results show on-device SLMs to be (statistically) significantly less trustworthy, specifically demonstrating more stereotypical, unfair and privacy-breaching behavior. Informed by these findings, we then perform our ethics assessment study by inferring whether SLMs would provide responses to potentially unethical vanilla prompts, collated from prior jailbreaking and prompt engineering studies and other sources. Strikingly, the on-device SLMs did answer valid responses to these prompts, which ideally should be rejected. Even more seriously, the on-device SLMs responded with valid answers without any filters and without the need for any jailbreaking or prompt engineering. These responses can be abused for various harmful and unethical scenarios including: societal harm, illegal activities, hate, self-harm, exploitable phishing content and exploitable code, all of which indicates the high vulnerability and exploitability of these on-device SLMs. Overall, our findings highlight ga** vulnerabilities in state-of-the-art on-device AI which seem to stem from resource constraints faced by these models and which may make typical defenses fundamentally challenging to be deployed in these environments. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 26 pages, 31 figures and 5 tables

arXiv:2406.00594 [pdf]

Artificial General Intelligence (AGI) for the oil and gas industry: a review

Authors: Jimmy Xuekai Li, Tiancheng Zhang, Yiran Zhu, Zhongwei Chen

Abstract: Artificial General Intelligence (AGI) is set to profoundly impact the oil and gas industry by introducing unprecedented efficiencies and innovations. This paper explores AGI's foundational principles and its transformative applications, particularly focusing on the advancements brought about by large language models (LLMs) and extensive computer vision systems in the upstream sectors of the indust… ▽ More Artificial General Intelligence (AGI) is set to profoundly impact the oil and gas industry by introducing unprecedented efficiencies and innovations. This paper explores AGI's foundational principles and its transformative applications, particularly focusing on the advancements brought about by large language models (LLMs) and extensive computer vision systems in the upstream sectors of the industry. The integration of Artificial Intelligence (AI) has already begun resha** the oil and gas landscape, offering enhancements in production optimization, downtime reduction, safety improvements, and advancements in exploration and drilling techniques. These technologies streamline logistics, minimize maintenance costs, automate monotonous tasks, refine decision-making processes, foster team collaboration, and amplify profitability through error reduction and actionable insights extraction. Despite these advancements, the deployment of AI technologies faces challenges, including the necessity for skilled professionals for implementation and the limitations of model training on constrained datasets, which affects the models' adaptability across different contexts. The advent of generative AI, exemplified by innovations like ChatGPT and the Segment Anything Model (SAM), heralds a new era of high-density innovation. These developments highlight a shift towards natural language interfaces and domain-knowledge-driven AI, promising more accessible and tailored solutions for the oil and gas industry. This review articulates the vast potential AGI holds for tackling complex operational challenges within the upstream oil and gas industry, requiring near-human levels of intelligence. We discussed the promising applications, the hurdles of large-scale AGI model deployment, and the necessity for domain-specific knowledge in maximizing the benefits of these technologies. △ Less

Submitted 11 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

Comments: 20 Pages, Review paper, 15 Figures

arXiv:2405.19683 [pdf, other]

Breaking Indistinguishability with Transfer Learning: A First Look at SPECK32/64 Lightweight Block Ciphers

Authors: Jimmy Dani, Kalyan Nakka, Nitesh Saxena

Abstract: In this research, we introduce MIND-Crypt, a novel attack framework that uses deep learning (DL) and transfer learning (TL) to challenge the indistinguishability of block ciphers, specifically SPECK32/64 encryption algorithm in CBC mode (Cipher Block Chaining) against Known Plaintext Attacks (KPA). Our methodology includes training a DL model with ciphertexts of two messages encrypted using the sa… ▽ More In this research, we introduce MIND-Crypt, a novel attack framework that uses deep learning (DL) and transfer learning (TL) to challenge the indistinguishability of block ciphers, specifically SPECK32/64 encryption algorithm in CBC mode (Cipher Block Chaining) against Known Plaintext Attacks (KPA). Our methodology includes training a DL model with ciphertexts of two messages encrypted using the same key. The selected messages have the same byte-length and differ by only one bit at the binary level. This DL model employs a residual network architecture. For the TL, we use the trained DL model as a feature extractor, and these features are then used to train a shallow machine learning, such as XGBoost. This dual strategy aims to distinguish ciphertexts of two encrypted messages, addressing traditional cryptanalysis challenges. Our findings demonstrate that the DL model achieves an accuracy of approximately 99% under consistent cryptographic conditions (Same Key or Rounds) with the SPECK32/64 cipher. However, performance degrades to random guessing levels (50%) when tested with ciphertext generated from different keys or different encryption rounds of SPECK32/64. To enhance the results, the DL model requires retraining with different keys or encryption rounds using larger datasets (10^7 samples). To overcome this limitation, we implement TL, achieving an accuracy of about 53% with just 10,000 samples, which is better than random guessing. Further training with 580,000 samples increases accuracy to nearly 99%, showing a substantial reduction in data requirements by over 94%. This shows that an attacker can utilize machine learning models to break indistinguishability by accessing pairs of plaintexts and their corresponding ciphertexts encrypted with the same key, without directly interacting with the communicating parties. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19325 [pdf, other]

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

Authors: Minghan Li, Xilun Chen, Ari Holtzman, Beidi Chen, Jimmy Lin, Wen-tau Yih, Xi Victoria Lin

Abstract: Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations. Semi-parametric LMs, such as kNN-LM, approach these limitations by refining the output of an LM for a given prompt using its nearest neighbor matches in a non-parametric data store. However, these models often exhibit slow inference speeds and produce non-fluent texts. In this paper, w… ▽ More Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations. Semi-parametric LMs, such as kNN-LM, approach these limitations by refining the output of an LM for a given prompt using its nearest neighbor matches in a non-parametric data store. However, these models often exhibit slow inference speeds and produce non-fluent texts. In this paper, we introduce Nearest Neighbor Speculative Decoding (NEST), a novel semi-parametric language modeling approach that is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources. NEST performs token-level retrieval at each inference step to compute a semi-parametric mixture distribution and identify promising span continuations in a corpus. It then uses an approximate speculative decoding procedure that accepts a prefix of the retrieved span or generates a new token. NEST significantly enhances the generation quality and attribution rate of the base LM across a variety of knowledge-intensive tasks, surpassing the conventional kNN-LM method and performing competitively with in-context retrieval augmentation. In addition, NEST substantially improves the generation speed, achieving a 1.8x speedup in inference time when applied to Llama-2-Chat 70B. △ Less

Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18685 [pdf, other]

Low-Mass Galaxy Interactions Trigger Black Hole Activity

Authors: Marko Mićić, Jimmy A. Irwin, Preethi Nair, Brenna N. Wells, Olivia J. Holmes, Jackson T. Eames

Abstract: The existence of high-$z$ over-massive supermassive black holes represents a major conundrum in our understanding of black hole evolution. In this paper, we probe from the observational point of view how early Universe environmental conditions could have acted as an evolutionary mechanism for the accelerated growth of the first black holes. Under the assumption that the early Universe is dominated… ▽ More The existence of high-$z$ over-massive supermassive black holes represents a major conundrum in our understanding of black hole evolution. In this paper, we probe from the observational point of view how early Universe environmental conditions could have acted as an evolutionary mechanism for the accelerated growth of the first black holes. Under the assumption that the early Universe is dominated by dwarf galaxies, we investigate the hypothesis that dwarf-dwarf galaxy interactions trigger black hole accretion. We present the discovery of 82 dwarf-dwarf galaxy pairs and 11 dwarf galaxy groups using the Hubble Space Telescope, doubling existing samples. The dwarf systems span a redshift range of 0.13$<$z$<$1.5, and a stellar mass range of 7.24$<$log(M$_*$/$M_\odot$)$<$9.73. We performed an X-ray study of a subset of these dwarf systems with Chandra and detected six new AGN, increasing the number of known dwarf-dwarf-merger-related AGN from one to seven. We then compared the frequency of these AGN in grouped/paired dwarfs to that of isolated dwarfs and found a statistically significant enhancement (4$σ$-6$σ$) in the interacting sample. This study, the first of its kind at the lowest mass scales, implies that the presence of a nearby dwarf neighbor is efficient in triggering black hole accretion. These results open new avenues for indirect studies of the emergence of the first supermassive black holes. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 19 pages, 5 figures, 4 tables. Accepted for publication in the Astrophysical Journal Letters

arXiv:2405.17387 [pdf, other]

Batteryless BLE and Light-based IoT Sensor Nodes for Reliable Environmental Sensing

Authors: Jimmy Fernandez Landivar, Khojiakbar Botirov, Hazem Sallouha, Marcos Katz, Sofie Pollin

Abstract: The sustainable design of Internet of Things (IoT) networks encompasses considerations related to energy efficiency and autonomy as well as considerations related to reliable communications, ensuring no energy is wasted on undelivered data. Under these considerations, this work proposes the design and implementation of energy-efficient Bluetooth Low Energy (BLE) and Light-based IoT (LIoT) batteryl… ▽ More The sustainable design of Internet of Things (IoT) networks encompasses considerations related to energy efficiency and autonomy as well as considerations related to reliable communications, ensuring no energy is wasted on undelivered data. Under these considerations, this work proposes the design and implementation of energy-efficient Bluetooth Low Energy (BLE) and Light-based IoT (LIoT) batteryless IoT sensor nodes powered by an indoor light Energy Harvesting Unit (EHU). Our design intends to integrate these nodes into a sensing network to improve its reliability by combining both technologies and taking advantage of their features. The nodes incorporate state-of-the-art components, such as low-power sensors and efficient System-on-Chips (SoCs). Moreover, we design a strategy for adaptive switching between active and sleep cycles as a function of the available energy, allowing the IoT nodes to continuously operate without batteries. Our results show that by adapting the duty cycle of the BLE and LIoT nodes depending on the environment's light intensity, we can ensure a continuous and reliable node operation. In particular, measurements show that our proposed BLE and LIoT node designs are able to communicate with an IoT gateway in a bidirectional way, every 19.3 and 624.6 seconds, respectively, in an energy-autonomous and reliable manner. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 6 pages, 9 figures, accepted for publication in the IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC 2024), Valencia, Spain

MSC Class: 94C30 ACM Class: I.2.9

arXiv:2405.16759 [pdf, other]

Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

Authors: Cristina N. Vasconcelos, Abdullah Rashwan, Austin Waters, Trevor Walker, Keyang Xu, Jimmy Yan, Rui Qian, Shixin Luo, Zarana Parekh, Andrew Bunner, Hongliang Fei, Roopal Garg, Mandy Guo, Ivana Kajic, Yeqing Li, Henna Nandwani, Jordi Pont-Tuset, Yasumasa Onoe, Sarah Rosston, Su Wang, Wenlei Zhou, Kevin Swersky, David J. Fleet, Jason M. Baldridge, Oliver Wang

Abstract: We address the long-standing problem of how to learn effective pixel-based image diffusion models at scale, introducing a remarkably simple greedy growing method for stable training of large-scale, high-resolution models. without the needs for cascaded super-resolution components. The key insight stems from careful pre-training of core components, namely, those responsible for text-to-image alignm… ▽ More We address the long-standing problem of how to learn effective pixel-based image diffusion models at scale, introducing a remarkably simple greedy growing method for stable training of large-scale, high-resolution models. without the needs for cascaded super-resolution components. The key insight stems from careful pre-training of core components, namely, those responsible for text-to-image alignment {\it vs.} high-resolution rendering. We first demonstrate the benefits of scaling a {\it Shallow UNet}, with no down(up)-sampling enc(dec)oder. Scaling its deep core layers is shown to improve alignment, object structure, and composition. Building on this core model, we propose a greedy algorithm that grows the architecture into high-resolution end-to-end models, while preserving the integrity of the pre-trained representation, stabilizing training, and reducing the need for large high-resolution datasets. This enables a single stage model capable of generating high-resolution images without the need of a super-resolution cascade. Our key results rely on public datasets and show that we are able to train non-cascaded models up to 8B parameters with no further regularization schemes. Vermeer, our full pipeline model trained with internal datasets to produce 1024x1024 images, without cascades, is preferred by 44.0% vs. 21.4% human evaluators over SDXL. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.10961 [pdf, other]

Simplified discrete model for axisymmetric dielectric elastomer membranes with robotic applications

Authors: Zhaowei Liu, Mingchao Liu, K. Jimmy Hsia, Xiaonan Huang, Weicheng Huang

Abstract: Soft robots utilizing inflatable dielectric membranes can realize intricate functionalities through the application of non-mechanical fields. However, given the current limitations in simulations, including low computational efficiency and difficulty in dealing with complex external interactions, the design and control of such soft robots often require trial and error. Thus, a novel one-dimensiona… ▽ More Soft robots utilizing inflatable dielectric membranes can realize intricate functionalities through the application of non-mechanical fields. However, given the current limitations in simulations, including low computational efficiency and difficulty in dealing with complex external interactions, the design and control of such soft robots often require trial and error. Thus, a novel one-dimensional (1D) discrete differential geometry (DDG)-based numerical model is developed for analyzing the highly nonlinear mechanics in axisymmetric inflatable dielectric membranes. The model captures the intricate dynamics of these membranes under both inflationary pressure and electrical stimulation. Comprehensive validations using hyperelastic benchmarks demonstrate the model's accuracy and reliability. Additionally, the focus on the electro-mechanical coupling elucidates critical insights into the membrane's behavior under varying internal pressures and electrical loads. The research further translates these findings into innovative soft robotic applications, including a spherical soft actuator, a soft circular fluid pump, and a soft toroidal gripper, where the snap-through of electroelastic membrane plays a crucial role. Our analyses reveal that the functional ranges of soft robots are amplified by the snap-through of an electroelastic membrane upon electrical stimuli. This study underscores the potential of DDG-based simulations to advance the understanding of the nonlinear mechanics of electroelastic membranes and guide the design of electroelastic actuators in soft robotics applications. △ Less

Submitted 23 April, 2024; originally announced May 2024.

Comments: 27 pages, 8 figures

arXiv:2405.10311 [pdf, other]

UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models

Authors: Sahel Sharifymoghaddam, Shivani Upadhyay, Wenhu Chen, Jimmy Lin

Abstract: Recently, Multi-Modal(MM) Large Language Models(LLMs) have unlocked many complex use-cases that require MM understanding (e.g., image captioning or visual question answering) and MM generation (e.g., text-guided image generation or editing) capabilities. To further improve the output fidelity of MM-LLMs we introduce the model-agnostic UniRAG technique that adds relevant retrieved information to pr… ▽ More Recently, Multi-Modal(MM) Large Language Models(LLMs) have unlocked many complex use-cases that require MM understanding (e.g., image captioning or visual question answering) and MM generation (e.g., text-guided image generation or editing) capabilities. To further improve the output fidelity of MM-LLMs we introduce the model-agnostic UniRAG technique that adds relevant retrieved information to prompts as few-shot examples during inference. Unlike the common belief that Retrieval Augmentation (RA) mainly improves generation or understanding of uncommon entities, our evaluation results on the MSCOCO dataset with common entities show that both proprietary models like GPT4 and Gemini-Pro and smaller open-source models like Llava, LaVIT, and Emu2 significantly enhance their generation quality when their input prompts are augmented with relevant information retrieved by MM retrievers like UniIR models. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 11 pages, 7 figures

arXiv:2405.07503 [pdf, other]

Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation

Authors: Aaditya Prasad, Kevin Lin, Jimmy Wu, Linqi Zhou, Jeannette Bohg

Abstract: Many robotic systems, such as mobile manipulators or quadrotors, cannot be equipped with high-end GPUs due to space, weight, and power constraints. These constraints prevent these systems from leveraging recent developments in visuomotor policy architectures that require high-end GPUs to achieve fast policy inference. In this paper, we propose Consistency Policy, a faster and similarly powerful al… ▽ More Many robotic systems, such as mobile manipulators or quadrotors, cannot be equipped with high-end GPUs due to space, weight, and power constraints. These constraints prevent these systems from leveraging recent developments in visuomotor policy architectures that require high-end GPUs to achieve fast policy inference. In this paper, we propose Consistency Policy, a faster and similarly powerful alternative to Diffusion Policy for learning visuomotor robot control. By virtue of its fast inference speed, Consistency Policy can enable low latency decision making in resource-constrained robotic setups. A Consistency Policy is distilled from a pretrained Diffusion Policy by enforcing self-consistency along the Diffusion Policy's learned trajectories. We compare Consistency Policy with Diffusion Policy and other related speed-up methods across 6 simulation tasks as well as three real-world tasks where we demonstrate inference on a laptop GPU. For all these tasks, Consistency Policy speeds up inference by an order of magnitude compared to the fastest alternative method and maintains competitive success rates. We also show that the Conistency Policy training procedure is robust to the pretrained Diffusion Policy's quality, a useful result that helps practioners avoid extensive testing of the pretrained model. Key design decisions that enabled this performance are the choice of consistency objective, reduced initial sample variance, and the choice of preset chaining steps. △ Less

Submitted 28 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: https://consistency-policy.github.io/

arXiv:2405.06147 [pdf, other]

State-Free Inference of State-Space Models: The Transfer Function Approach

Authors: Rom N. Parnichkun, Stefano Massaroli, Alessandro Moro, Jimmy T. H. Smith, Ramin Hasani, Mathias Lechner, Qi An, Christopher Ré, Hajime Asama, Stefano Ermon, Taiji Suzuki, Atsushi Yamashita, Michael Poli

Abstract: We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of… ▽ More We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of the proposed frequency domain transfer function parametrization, which enables direct computation of its corresponding convolutional kernel's spectrum via a single Fast Fourier Transform. Our experimental results across multiple sequence lengths and state sizes illustrates, on average, a 35% training speed improvement over S4 layers -- parametrized in time-domain -- on the Long Range Arena benchmark, while delivering state-of-the-art downstream performances over other attention-free approaches. Moreover, we report improved perplexity in language modeling over a long convolutional Hyena baseline, by simply introducing our transfer function parametrization. Our code is available at https://github.com/ruke1ire/RTF. △ Less

Submitted 1 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Resubmission 02/06/2024: Fixed minor typo of recurrent form RTF

arXiv:2405.05562 [pdf, other]

Review-based Recommender Systems: A Survey of Approaches, Challenges and Future Perspectives

Authors: Emrul Hasan, Mizanur Rahman, Chen Ding, Jimmy Xiangji Huang, Shaina Raza

Abstract: Recommender systems play a pivotal role in hel** users navigate an overwhelming selection of products and services. On online platforms, users have the opportunity to share feedback in various modes, including numerical ratings, textual reviews, and likes/dislikes. Traditional recommendation systems rely on users explicit ratings or implicit interactions (e.g. likes, clicks, shares, saves) to le… ▽ More Recommender systems play a pivotal role in hel** users navigate an overwhelming selection of products and services. On online platforms, users have the opportunity to share feedback in various modes, including numerical ratings, textual reviews, and likes/dislikes. Traditional recommendation systems rely on users explicit ratings or implicit interactions (e.g. likes, clicks, shares, saves) to learn user preferences and item characteristics. Beyond these numerical ratings, textual reviews provide insights into users fine-grained preferences and item features. Analyzing these reviews is crucial for enhancing the performance and interpretability of personalized recommendation results. In recent years, review-based recommender systems have emerged as a significant sub-field in this domain. In this paper, we provide a comprehensive overview of the developments in review-based recommender systems over recent years, highlighting the importance of reviews in recommender systems, as well as the challenges associated with extracting features from reviews and integrating them into ratings. Specifically, we present a categorization of these systems and summarize the state-of-the-art methods, analyzing their unique features, effectiveness, and limitations. Finally, we propose potential directions for future research, including the integration of multimodal data, multi-criteria rating information, and ethical considerations. △ Less

Submitted 11 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: The first two authors contributed equally

arXiv:2405.04867 [pdf, other]

MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Hai** Zeng, Kai Feng , et al. (24 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2405.04727 [pdf, other]

LLMs Can Patch Up Missing Relevance Judgments in Evaluation

Authors: Shivani Upadhyay, Ehsan Kamalloo, Jimmy Lin

Abstract: Unjudged documents or holes in information retrieval benchmarks are considered non-relevant in evaluation, yielding no gains in measuring effectiveness. However, these missing judgments may inadvertently introduce biases into the evaluation as their prevalence for a retrieval model is heavily contingent on the pooling process. Thus, filling holes becomes crucial in ensuring reliable and accurate e… ▽ More Unjudged documents or holes in information retrieval benchmarks are considered non-relevant in evaluation, yielding no gains in measuring effectiveness. However, these missing judgments may inadvertently introduce biases into the evaluation as their prevalence for a retrieval model is heavily contingent on the pooling process. Thus, filling holes becomes crucial in ensuring reliable and accurate evaluation. Collecting human judgment for all documents is cumbersome and impractical. In this paper, we aim at leveraging large language models (LLMs) to automatically label unjudged documents. Our goal is to instruct an LLM using detailed instructions to assign fine-grained relevance judgments to holes. To this end, we systematically simulate scenarios with varying degrees of holes by randomly drop** relevant documents from the relevance judgment in TREC DL tracks. Our experiments reveal a strong correlation between our LLM-based method and ground-truth relevance judgments. Based on our simulation experiments conducted on three TREC DL datasets, in the extreme scenario of retaining only 10% of judgments, our method achieves a Kendall tau correlation of 0.87 and 0.92 on an average for Vicuña-7B and GPT-3.5 Turbo respectively. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 5 pages, 4 figures

arXiv:2405.01525 [pdf, other]

FLAME: Factuality-Aware Alignment for Large Language Models

Authors: Sheng-Chieh Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Wen-tau Yih, Xilun Chen

Abstract: Alignment is a standard procedure to fine-tune pre-trained large language models (LLMs) to follow natural language instructions and serve as helpful AI assistants. We have observed, however, that the conventional alignment process fails to enhance the factual accuracy of LLMs, and often leads to the generation of more false facts (i.e. hallucination). In this paper, we study how to make the LLM al… ▽ More Alignment is a standard procedure to fine-tune pre-trained large language models (LLMs) to follow natural language instructions and serve as helpful AI assistants. We have observed, however, that the conventional alignment process fails to enhance the factual accuracy of LLMs, and often leads to the generation of more false facts (i.e. hallucination). In this paper, we study how to make the LLM alignment process more factual, by first identifying factors that lead to hallucination in both alignment steps:\ supervised fine-tuning (SFT) and reinforcement learning (RL). In particular, we find that training the LLM on new knowledge or unfamiliar texts can encourage hallucination. This makes SFT less factual as it trains on human labeled data that may be novel to the LLM. Furthermore, reward functions used in standard RL can also encourage hallucination, because it guides the LLM to provide more helpful responses on a diverse set of instructions, often preferring longer and more detailed responses. Based on these observations, we propose factuality-aware alignment, comprised of factuality-aware SFT and factuality-aware RL through direct preference optimization. Experiments show that our proposed factuality-aware alignment guides LLMs to output more factual responses while maintaining instruction-following capability. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.01481 [pdf, other]

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

Authors: Gerald Shen, Zhilin Wang, Olivier Delalleau, Jiaqi Zeng, Yi Dong, Daniel Egert, Shengyang Sun, Jimmy Zhang, Sahil Jain, Ali Taghibakhshi, Markel Sanz Ausin, Ashwath Aithal, Oleksii Kuchaiev

Abstract: Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment can be challenging, especially for the largest and most competent LLMs which often contain tens or hundreds of billions of parameters. We create NeMo-Aligner, a toolkit for model alignment that can efficiently scale to using h… ▽ More Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment can be challenging, especially for the largest and most competent LLMs which often contain tens or hundreds of billions of parameters. We create NeMo-Aligner, a toolkit for model alignment that can efficiently scale to using hundreds of GPUs for training. NeMo-Aligner comes with highly optimized and scalable implementations for major paradigms of model alignment such as: Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), SteerLM, and Self-Play Fine-Tuning (SPIN). Additionally, our toolkit supports running most of the alignment techniques in a Parameter Efficient Fine-Tuning (PEFT) setting. NeMo-Aligner is designed for extensibility, allowing support for other alignment techniques with minimal effort. It is open-sourced with Apache 2.0 License and we invite community contributions at https://github.com/NVIDIA/NeMo-Aligner △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 13 pages, 4 figures

arXiv:2404.19321 [pdf, other]

Observation of strain-rate softening behavior in jammed granular media

Authors: Mingchao Liu, Weining Mao, Yiqiu Zhao, Qin Xu, Yixiang Gan, Yifan Wang, K Jimmy Hsia

Abstract: The strain-rate sensitivity of confined granular materials has been widely explored, with most findings exhibiting rate-strengthening behaviors. This study, however, reveals a distinct rate-softening behavior across a certain strain rate range based on triaxial tests on particle clusters of various materials with different surface properties, particle sizes, shapes, and stiffness. This softening e… ▽ More The strain-rate sensitivity of confined granular materials has been widely explored, with most findings exhibiting rate-strengthening behaviors. This study, however, reveals a distinct rate-softening behavior across a certain strain rate range based on triaxial tests on particle clusters of various materials with different surface properties, particle sizes, shapes, and stiffness. This softening effect is especially pronounced in the case of common rice particles. By examining the behavior of rice particles under different confining pressure and surface conditions, and directly measuring the frictional coefficient across various loading rates, we find that the reduction in surface frictional coefficient with the increasing strain rate predominantly contributes to this rate-softening behavior. This conclusion is validated by results from Finite Element Method (FEM) simulations. Additionally, we identify confining pressure as a critical factor regulating the normal stress between particles, and thereby enhancing frictional behavior. Rheometer tests reveal that the shear modulus exhibits a similar rate-softening trend. This study of rate-softening behavior in granular materials enhances our understanding of the mechanisms during their deformation under confining pressure. It also suggests that local inter-particle tribology significantly impacts overall granular behavior. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: 16 pages, 12 figures

arXiv:2404.19165 [pdf, other]

DelGrad: Exact gradients in spiking networks for learning transmission delays and weights

Authors: Julian Göltz, Jimmy Weber, Laura Kriener, Peter Lake, Melika Payvand, Mihai A. Petrovici

Abstract: Spiking neural networks (SNNs) inherently rely on the timing of signals for representing and processing information. Transmission delays play an important role in sha** these temporal characteristics. Recent work has demonstrated the substantial advantages of learning these delays along with synaptic weights, both in terms of accuracy and memory efficiency. However, these approaches suffer from… ▽ More Spiking neural networks (SNNs) inherently rely on the timing of signals for representing and processing information. Transmission delays play an important role in sha** these temporal characteristics. Recent work has demonstrated the substantial advantages of learning these delays along with synaptic weights, both in terms of accuracy and memory efficiency. However, these approaches suffer from drawbacks in terms of precision and efficiency, as they operate in discrete time and with approximate gradients, while also requiring membrane potential recordings for calculating parameter updates. To alleviate these issues, we propose an analytical approach for calculating exact loss gradients with respect to both synaptic weights and delays in an event-based fashion. The inclusion of delays emerges naturally within our proposed formalism, enriching the model's search space with a temporal dimension. Our algorithm is purely based on the timing of individual spikes and does not require access to other variables such as membrane potentials. We explicitly compare the impact on accuracy and parameter efficiency of different types of delays - axonal, dendritic and synaptic. Furthermore, while previous work on learnable delays in SNNs has been mostly confined to software simulations, we demonstrate the functionality and benefits of our approach on the BrainScaleS-2 neuromorphic platform. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 15 pages, 7 figures

arXiv:2404.18424 [pdf, other]

PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval

Authors: Shengyao Zhuang, Xueguang Ma, Bevan Koopman, Jimmy Lin, Guido Zuccon

Abstract: Utilizing large language models (LLMs) for zero-shot document ranking is done in one of two ways: 1) prompt-based re-ranking methods, which require no further training but are only feasible for re-ranking a handful of candidate documents due to computational costs; and 2) unsupervised contrastive trained dense retrieval methods, which can retrieve relevant documents from the entire corpus but requ… ▽ More Utilizing large language models (LLMs) for zero-shot document ranking is done in one of two ways: 1) prompt-based re-ranking methods, which require no further training but are only feasible for re-ranking a handful of candidate documents due to computational costs; and 2) unsupervised contrastive trained dense retrieval methods, which can retrieve relevant documents from the entire corpus but require a large amount of paired text data for contrastive training. In this paper, we propose PromptReps, which combines the advantages of both categories: no need for training and the ability to retrieve from the whole corpus. Our method only requires prompts to guide an LLM to generate query and document representations for effective document retrieval. Specifically, we prompt the LLMs to represent a given text using a single word, and then use the last token's hidden states and the corresponding logits associated with the prediction of the next token to construct a hybrid document retrieval system. The retrieval system harnesses both dense text embedding and sparse bag-of-words representations given by the LLM. We further explore variations of this core idea that consider the generation of multiple words, and representations that rely on multiple embeddings and sparse distributions. Our experimental evaluation on the MSMARCO, TREC deep learning and BEIR zero-shot document retrieval datasets illustrates that this simple prompt-based LLM retrieval method can achieve a similar or higher retrieval effectiveness than state-of-the-art LLM embedding methods that are trained with large amounts of unsupervised data, especially when using a larger LLM. △ Less

Submitted 16 June, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.15807 [pdf, other]

One Subgraph for All: Efficient Reasoning on Opening Subgraphs for Inductive Knowledge Graph Completion

Authors: Zhiwen Xie, Yi Zhang, Guangyou Zhou, ** Liu, Xinhui Tu, Jimmy Xiangji Huang

Abstract: Knowledge Graph Completion (KGC) has garnered massive research interest recently, and most existing methods are designed following a transductive setting where all entities are observed during training. Despite the great progress on the transductive KGC, these methods struggle to conduct reasoning on emerging KGs involving unseen entities. Thus, inductive KGC, which aims to deduce missing links am… ▽ More Knowledge Graph Completion (KGC) has garnered massive research interest recently, and most existing methods are designed following a transductive setting where all entities are observed during training. Despite the great progress on the transductive KGC, these methods struggle to conduct reasoning on emerging KGs involving unseen entities. Thus, inductive KGC, which aims to deduce missing links among unseen entities, has become a new trend. Many existing studies transform inductive KGC as a graph classification problem by extracting enclosing subgraphs surrounding each candidate triple. Unfortunately, they still face certain challenges, such as the expensive time consumption caused by the repeat extraction of enclosing subgraphs, and the deficiency of entity-independent feature learning. To address these issues, we propose a global-local anchor representation (GLAR) learning method for inductive KGC. Unlike previous methods that utilize enclosing subgraphs, we extract a shared opening subgraph for all candidates and perform reasoning on it, enabling the model to perform reasoning more efficiently. Moreover, we design some transferable global and local anchors to learn rich entity-independent features for emerging entities. Finally, a global-local graph reasoning model is applied on the opening subgraph to rank all candidates. Extensive experiments show that our GLAR outperforms most existing state-of-the-art methods. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15279 [pdf, other]

Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification

Authors: Jimmy Lin, Junkai Li, Jiasi Gao, Weizhi Ma, Yang Liu

Abstract: Tactile signals collected by wearable electronics are essential in modeling and understanding human behavior. One of the main applications of tactile signals is action classification, especially in healthcare and robotics. However, existing tactile classification methods fail to capture the spatial and temporal features of tactile signals simultaneously, which results in sub-optimal performances.… ▽ More Tactile signals collected by wearable electronics are essential in modeling and understanding human behavior. One of the main applications of tactile signals is action classification, especially in healthcare and robotics. However, existing tactile classification methods fail to capture the spatial and temporal features of tactile signals simultaneously, which results in sub-optimal performances. In this paper, we design Spatio-Temporal Aware tactility Transformer (STAT) to utilize continuous tactile signals for action classification. We propose spatial and temporal embeddings along with a new temporal pretraining task in our model, which aims to enhance the transformer in modeling the spatio-temporal features of tactile signals. Specially, the designed temporal pretraining task is to differentiate the time order of tubelet inputs to model the temporal properties explicitly. Experimental results on a public action classification dataset demonstrate that our model outperforms state-of-the-art methods in all metrics. △ Less

Submitted 20 January, 2024; originally announced April 2024.

Comments: Accepted by AAAI 2024

arXiv:2404.10981 [pdf, other]

A Survey on Retrieval-Augmented Text Generation for Large Language Models

Authors: Yizheng Huang, Jimmy Huang

Abstract: Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of large language models (LLMs) by enabling the dynamic integration of up-to-date external information. This methodology, focusing primarily on the text domain, provides a cost-effective solution to the generation of plausible but incorrect responses by LLMs, thereby enha… ▽ More Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of large language models (LLMs) by enabling the dynamic integration of up-to-date external information. This methodology, focusing primarily on the text domain, provides a cost-effective solution to the generation of plausible but incorrect responses by LLMs, thereby enhancing the accuracy and reliability of their outputs through the use of real-world data. As RAG grows in complexity and incorporates multiple concepts that can influence its performance, this paper organizes the RAG paradigm into four categories: pre-retrieval, retrieval, post-retrieval, and generation, offering a detailed perspective from the retrieval viewpoint. It outlines RAG's evolution and discusses the field's progression through the analysis of significant studies. Additionally, the paper introduces evaluation methods for RAG, addressing the challenges faced and proposing future research directions. By offering an organized framework and categorization, the study aims to consolidate existing research on RAG, clarify its technological underpinnings, and highlight its potential to broaden the adaptability and applications of LLMs. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Ongoing work

arXiv:2404.05386 [pdf, other]

MealRec$^+$: A Meal Recommendation Dataset with Meal-Course Affiliation for Personalization and Healthiness

Authors: Ming Li, Lin Li, Xiaohui Tao, Jimmy Xiangji Huang

Abstract: Meal recommendation, as a typical health-related recommendation task, contains complex relationships between users, courses, and meals. Among them, meal-course affiliation associates user-meal and user-course interactions. However, an extensive literature review demonstrates that there is a lack of publicly available meal recommendation datasets including meal-course affiliation. Meal recommendati… ▽ More Meal recommendation, as a typical health-related recommendation task, contains complex relationships between users, courses, and meals. Among them, meal-course affiliation associates user-meal and user-course interactions. However, an extensive literature review demonstrates that there is a lack of publicly available meal recommendation datasets including meal-course affiliation. Meal recommendation research has been constrained in exploring the impact of cooperation between two levels of interaction on personalization and healthiness. To pave the way for meal recommendation research, we introduce a new benchmark dataset called MealRec$^+$. Due to constraints related to user health privacy and meal scenario characteristics, the collection of data that includes both meal-course affiliation and two levels of interactions is impeded. Therefore, a simulation method is adopted to derive meal-course affiliation and user-meal interaction from the user's dining sessions simulated based on user-course interaction data. Then, two well-known nutritional standards are used to calculate the healthiness scores of meals. Moreover, we experiment with several baseline models, including separate and cooperative interaction learning methods. Our experiment demonstrates that cooperating the two levels of interaction in appropriate ways is beneficial for meal recommendations. Furthermore, in response to the less healthy recommendation phenomenon found in the experiment, we explore methods to enhance the healthiness of meal recommendations. The dataset is available on GitHub (https://github.com/WUT-IDEA/MealRecPlus). △ Less

Submitted 27 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted by SIGIR 2024

arXiv:2403.18639 [pdf, other]

Dependency Aware Incident Linking in Large Cloud Systems

Authors: Supriyo Ghosh, Karish Grover, Jimmy Wong, Chetan Bansal, Rakesh Namineni, Mohit Verma, Saravan Rajmohan

Abstract: Despite significant reliability efforts, large-scale cloud services inevitably experience production incidents that can significantly impact service availability and customer's satisfaction. Worse, in many cases one incident can lead to multiple downstream failures due to cascading effects that creates several related incidents across different dependent services. Often time On-call Engineers (OCE… ▽ More Despite significant reliability efforts, large-scale cloud services inevitably experience production incidents that can significantly impact service availability and customer's satisfaction. Worse, in many cases one incident can lead to multiple downstream failures due to cascading effects that creates several related incidents across different dependent services. Often time On-call Engineers (OCEs) examine these incidents in silos that lead to significant amount of manual toil and increase the overall time-to-mitigate incidents. Therefore, develo** efficient incident linking models is of paramount importance for grou** related incidents into clusters so as to quickly resolve major outages and reduce on-call fatigue. Existing incident linking methods mostly leverages textual and contextual information of incidents (e.g., title, description, severity, impacted components), thus failing to leverage the inter-dependencies between services. In this paper, we propose the dependency-aware incident linking (DiLink) framework which leverages both textual and service dependency graph information to improve the accuracy and coverage of incident links not only coming from same service, but also from different services and workloads. Furthermore, we propose a novel method to align the embeddings of multi-modal (i.e., textual and graphical) data using Orthogonal Procrustes. Extensive experimental results on real-world incidents from 5 workloads of Microsoft demonstrate that our alignment method has an F1-score of 0.96 (14% gain over current state-of-the-art methods). We are also in the process of deploying this solution across 610 services from these 5 workloads for continuously supporting OCEs improving incident management and reducing manual toil. △ Less

Submitted 5 February, 2024; originally announced March 2024.

arXiv:2403.12945 [pdf, other]

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Project website: https://droid-dataset.github.io/

arXiv:2403.11407 [pdf, other]

Divide-and-Conquer Posterior Sampling for Denoising Diffusion Priors

Authors: Yazid Janati, Alain Durmus, Eric Moulines, Jimmy Olsson

Abstract: Interest in the use of Denoising Diffusion Models (DDM) as priors for solving inverse Bayesian problems has recently increased significantly. However, sampling from the resulting posterior distribution poses a challenge. To solve this problem, previous works have proposed approximations to bias the drift term of the diffusion. In this work, we take a different approach and utilize the specific str… ▽ More Interest in the use of Denoising Diffusion Models (DDM) as priors for solving inverse Bayesian problems has recently increased significantly. However, sampling from the resulting posterior distribution poses a challenge. To solve this problem, previous works have proposed approximations to bias the drift term of the diffusion. In this work, we take a different approach and utilize the specific structure of the DDM prior to define a set of intermediate and simpler posterior sampling problems, resulting in a lower approximation error compared to previous methods. We empirically demonstrate the reconstruction capability of our method for general linear inverse problems using synthetic examples and various image restoration tasks. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: preprint

arXiv:2403.09969 [pdf, other]

Prediction of Vessel Arrival Time to Pilotage Area Using Multi-Data Fusion and Deep Learning

Authors: Xiaocai Zhang, Xiuju Fu, Zhe Xiao, Haiyan Xu, Xiaoyang Wei, Jimmy Koh, Daichi Ogawa, Zheng Qin

Abstract: This paper investigates the prediction of vessels' arrival time to the pilotage area using multi-data fusion and deep learning approaches. Firstly, the vessel arrival contour is extracted based on Multivariate Kernel Density Estimation (MKDE) and clustering. Secondly, multiple data sources, including Automatic Identification System (AIS), pilotage booking information, and meteorological data, are… ▽ More This paper investigates the prediction of vessels' arrival time to the pilotage area using multi-data fusion and deep learning approaches. Firstly, the vessel arrival contour is extracted based on Multivariate Kernel Density Estimation (MKDE) and clustering. Secondly, multiple data sources, including Automatic Identification System (AIS), pilotage booking information, and meteorological data, are fused before latent feature extraction. Thirdly, a Temporal Convolutional Network (TCN) framework that incorporates a residual mechanism is constructed to learn the hidden arrival patterns of the vessels. Extensive tests on two real-world data sets from Singapore have been conducted and the following promising results have been obtained: 1) fusion of pilotage booking information and meteorological data improves the prediction accuracy, with pilotage booking information having a more significant impact; 2) using discrete embedding for the meteorological data performs better than using continuous embedding; 3) the TCN outperforms the state-of-the-art baseline methods in regression tasks, exhibiting Mean Absolute Error (MAE) ranging from 4.58 min to 4.86 min; and 4) approximately 89.41% to 90.61% of the absolute prediction residuals fall within a time frame of 10 min. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: The 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)

arXiv:2403.08146 [pdf, ps, other]

Nodal solutions to Paneitz-type equations

Authors: Jurgen Julio-Batalla, Jimmy Petean

Abstract: On a closed Riemannian manifold $(M^n ,g)$ with a proper isoparametric function $f$ we consider the equation $Δ^2 u -αΔu +βu = u^q$, where $α$ and $β$ are positive constants satisfying that $α^2 \geq 4 β$. We let ${\bf m}$ be the minimum of the dimensions of the focal varieties of $f$ and $q_f = \frac{n-{\bf m}+4}{n-{\bf m}-4}$, $q_f = \infty$ if $n\leq {\bf m}+4$. We prove the existence of infini… ▽ More On a closed Riemannian manifold $(M^n ,g)$ with a proper isoparametric function $f$ we consider the equation $Δ^2 u -αΔu +βu = u^q$, where $α$ and $β$ are positive constants satisfying that $α^2 \geq 4 β$. We let ${\bf m}$ be the minimum of the dimensions of the focal varieties of $f$ and $q_f = \frac{n-{\bf m}+4}{n-{\bf m}-4}$, $q_f = \infty$ if $n\leq {\bf m}+4$. We prove the existence of infinitely many nodal solutions of the equation assuming that $1<q<q_f$. The solutions are $f$-invariant. To obtain the result, first we prove a $C^0-$estimate for positive $f$-invariant solutions of the equation. Then we prove the existence of mountain pass solutions with arbitrarily large energy. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07495 [pdf, other]

Tuning diagonal scale matrices for HMC

Authors: Jimmy Huy Tran, Tore Selland Kleppe

Abstract: Three approaches for adaptively tuning diagonal scale matrices for HMC are discussed and compared. The common practice of scaling according to estimated marginal standard deviations is taken as a benchmark. Scaling according to the mean log-target gradient (ISG), and a scaling method targeting that the frequency of when the underlying Hamiltonian dynamics crosses the respective medians should be u… ▽ More Three approaches for adaptively tuning diagonal scale matrices for HMC are discussed and compared. The common practice of scaling according to estimated marginal standard deviations is taken as a benchmark. Scaling according to the mean log-target gradient (ISG), and a scaling method targeting that the frequency of when the underlying Hamiltonian dynamics crosses the respective medians should be uniform across dimensions, are taken as alternatives. Numerical studies suggest that the ISG method leads in many cases to more efficient sampling than the benchmark, in particular in cases with strong correlations or non-linear dependencies. The ISG method is also easy to implement, computationally cheap and would be relatively simple to include in automatically tuned codes as an alternative to the benchmark practice. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.03218 [pdf, other]

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Authors: Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer , et al. (32 additional authors not shown)

Abstract: The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in develo** biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are develo** evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing furthe… ▽ More The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in develo** biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are develo** evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 3,668 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at https://wmdp.ai △ Less

Submitted 15 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: See the project page at https://wmdp.ai

arXiv:2403.00784 [pdf, other]

Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges

Authors: Jiajia Wang, Jimmy X. Huang, Xinhui Tu, Junmei Wang, Angela J. Huang, Md Tahmid Rahman Laskar, Amran Bhuiyan

Abstract: Recent years have witnessed a substantial increase in the use of deep learning to solve various natural language processing (NLP) problems. Early deep learning models were constrained by their sequential or unidirectional nature, such that they struggled to capture the contextual relationships across text inputs. The introduction of bidirectional encoder representations from transformers (BERT) le… ▽ More Recent years have witnessed a substantial increase in the use of deep learning to solve various natural language processing (NLP) problems. Early deep learning models were constrained by their sequential or unidirectional nature, such that they struggled to capture the contextual relationships across text inputs. The introduction of bidirectional encoder representations from transformers (BERT) leads to a robust encoder for the transformer model that can understand the broader context and deliver state-of-the-art performance across various NLP tasks. This has inspired researchers and practitioners to apply BERT to practical problems, such as information retrieval (IR). A survey that focuses on a comprehensive analysis of prevalent approaches that apply pretrained transformer encoders like BERT to IR can thus be useful for academia and the industry. In light of this, we revisit a variety of BERT-based methods in this survey, cover a wide range of techniques of IR, and group them into six high-level categories: (i) handling long documents, (ii) integrating semantic information, (iii) balancing effectiveness and efficiency, (iv) predicting the weights of terms, (v) query expansion, and (vi) document expansion. We also provide links to resources, including datasets and toolkits, for BERT-based IR systems. A key highlight of our survey is the comparison between BERT's encoder-based models and the latest generative Large Language Models (LLMs), such as ChatGPT, which rely on decoders. Despite the popularity of LLMs, we find that for specific tasks, finely tuned BERT encoders still outperform, and at a lower deployment cost. Finally, we summarize the comprehensive outcomes of the survey and suggest directions for future research in the area. △ Less

Submitted 18 February, 2024; originally announced March 2024.

arXiv:2402.18545 [pdf, other]

Crowdsourcing Dermatology Images with Google Search Ads: Creating a Real-World Skin Condition Dataset

Authors: Abbi Ward, Jimmy Li, Julie Wang, Sriram Lakshminarasimhan, Ashley Carrick, Bilson Campana, Jay Hartford, Pradeep Kumar S, Tiya Tiyasirichokchai, Sunny Virmani, Renee Wong, Yossi Matias, Greg S. Corrado, Dale R. Webster, Dawn Siegel, Steven Lin, Justin Ko, Alan Karthikesalingam, Christopher Semturs, Pooja Rao

Abstract: Background: Health datasets from clinical sources do not reflect the breadth and diversity of disease in the real world, impacting research, medical education, and artificial intelligence (AI) tool development. Dermatology is a suitable area to develop and test a new and scalable method to create representative health datasets. Methods: We used Google Search advertisements to invite contribution… ▽ More Background: Health datasets from clinical sources do not reflect the breadth and diversity of disease in the real world, impacting research, medical education, and artificial intelligence (AI) tool development. Dermatology is a suitable area to develop and test a new and scalable method to create representative health datasets. Methods: We used Google Search advertisements to invite contributions to an open access dataset of images of dermatology conditions, demographic and symptom information. With informed contributor consent, we describe and release this dataset containing 10,408 images from 5,033 contributions from internet users in the United States over 8 months starting March 2023. The dataset includes dermatologist condition labels as well as estimated Fitzpatrick Skin Type (eFST) and Monk Skin Tone (eMST) labels for the images. Results: We received a median of 22 submissions/day (IQR 14-30). Female (66.72%) and younger (52% < age 40) contributors had a higher representation in the dataset compared to the US population, and 32.6% of contributors reported a non-White racial or ethnic identity. Over 97.5% of contributions were genuine images of skin conditions. Dermatologist confidence in assigning a differential diagnosis increased with the number of available variables, and showed a weaker correlation with image sharpness (Spearman's P values <0.001 and 0.01 respectively). Most contributions were short-duration (54% with onset < 7 days ago ) and 89% were allergic, infectious, or inflammatory conditions. eFST and eMST distributions reflected the geographical origin of the dataset. The dataset is available at github.com/google-research-datasets/scin . Conclusion: Search ads are effective at crowdsourcing images of health conditions. The SCIN dataset bridges important gaps in the availability of representative images of common skin conditions. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.11203 [pdf, ps, other]

doi 10.3233/WEB-230363

Exploring ChatGPT for Next-generation Information Retrieval: Opportunities and Challenges

Authors: Yizheng Huang, Jimmy Huang

Abstract: The rapid advancement of artificial intelligence (AI) has highlighted ChatGPT as a pivotal technology in the field of information retrieval (IR). Distinguished from its predecessors, ChatGPT offers significant benefits that have attracted the attention of both the industry and academic communities. While some view ChatGPT as a groundbreaking innovation, others attribute its success to the effectiv… ▽ More The rapid advancement of artificial intelligence (AI) has highlighted ChatGPT as a pivotal technology in the field of information retrieval (IR). Distinguished from its predecessors, ChatGPT offers significant benefits that have attracted the attention of both the industry and academic communities. While some view ChatGPT as a groundbreaking innovation, others attribute its success to the effective integration of product development and market strategies. The emergence of ChatGPT, alongside GPT-4, marks a new phase in Generative AI, generating content that is distinct from training examples and exceeding the capabilities of the prior GPT-3 model by OpenAI. Unlike the traditional supervised learning approach in IR tasks, ChatGPT challenges existing paradigms, bringing forth new challenges and opportunities regarding text quality assurance, model bias, and efficiency. This paper seeks to examine the impact of ChatGPT on IR tasks and offer insights into its potential future developments. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: Survey Paper

Journal ref: Web Intelligence, vol. 22, no. 1, pp. 31-44, 2024

arXiv:2402.10519 [pdf]

A Simple Modeling for Gas Release During Annealing of Irradiated Nuclear Fuel

Authors: Jimmy Losfeld, Lionel Desgranges, Yves Pontillon, Gianguido Baldinozzi

Abstract: We have developed a gas flow model in the spent nuclear fuel during the annealing. It postulates that the gas release during an isothermal plateau at 1200{\textdegree}C corresponds to the equilibrium between overpressure gas reservoirs in the fuel sample connected to the free surface at atmospheric pressure. We have developed a gas flow model in the spent nuclear fuel during the annealing. It postulates that the gas release during an isothermal plateau at 1200{\textdegree}C corresponds to the equilibrium between overpressure gas reservoirs in the fuel sample connected to the free surface at atmospheric pressure. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Journal ref: Transactions of the American Nuclear Society, 2023, 128 (1), pp.404-407

Showing 1–50 of 924 results for author: Jimmy