-
Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study
Authors:
Zhe He,
Balu Bhasuran,
Qiao **,
Shubo Tian,
Karim Hanna,
Cindy Shavor,
Lisbeth Garcia Arguello,
Patrick Murray,
Zhiyong Lu
Abstract:
Lab results are often confusing and hard to understand. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to get their questions answered. We aim to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to lab test-related questions asked by patients and to identify potential issues that can be mitigated with au…
▽ More
Lab results are often confusing and hard to understand. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to get their questions answered. We aim to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to lab test-related questions asked by patients and to identify potential issues that can be mitigated with augmentation approaches. We first collected lab test results related question and answer data from Yahoo! Answers and selected 53 QA pairs for this study. Using the LangChain framework and ChatGPT web portal, we generated responses to the 53 questions from four LLMs including GPT-4, Meta LLaMA 2, MedAlpaca, and ORCA_mini. We first assessed the similarity of their answers using standard QA similarity-based evaluation metrics including ROUGE, BLEU, METEOR, BERTScore. We also utilized an LLM-based evaluator to judge whether a target model has higher quality in terms of relevance, correctness, helpfulness, and safety than the baseline model. Finally, we performed a manual evaluation with medical experts for all the responses to seven selected questions on the same four aspects. The results of Win Rate and medical expert evaluation both showed that GPT-4's responses achieved better scores than all the other LLM responses and human responses on all four aspects (relevance, correctness, helpfulness, and safety). However, LLM responses occasionally also suffer from a lack of interpretation in one's medical context, incorrect statements, and lack of references. We find that compared to other three LLMs and human answer from the Q&A website, GPT-4's responses are more accurate, helpful, relevant, and safer. However, there are cases which GPT-4 responses are inaccurate and not individualized. We identified a number of ways to improve the quality of LLM responses.
△ Less
Submitted 23 January, 2024;
originally announced February 2024.
-
Hyperspectral Lightcurve Inversion for Attitude Determination
Authors:
Simão da Graça Marto,
Massimiliano Vasile,
Andrew Campbell,
Paul Murray,
Stephen Marshall,
Vasili Savitski
Abstract:
Spectral lightcurves consisting of time series single-pixel spectral measurements of spacecraft are used to infer the spacecraft's attitude and rotation. Two methods are used. One based on numerical optimisation of a regularised least squares cost function, and another based on machine learning with a neural network model. The aim is to work with minimal information, thus no prior is available on…
▽ More
Spectral lightcurves consisting of time series single-pixel spectral measurements of spacecraft are used to infer the spacecraft's attitude and rotation. Two methods are used. One based on numerical optimisation of a regularised least squares cost function, and another based on machine learning with a neural network model. The aim is to work with minimal information, thus no prior is available on the attitude nor on the inertia tensor. The theoretical and practical aspects of this task are investigated, and the methodology is tested on synthetic data. Results are shown based on synthetic data.
△ Less
Submitted 19 December, 2023;
originally announced January 2024.
-
Space Object Identification and Classification from Hyperspectral Material Analysis
Authors:
Massimiliano Vasile,
Lewis Walker,
Andrew Campbell,
Simao Marto,
Paul Murray,
Stephen Marshall,
Vasili Savitski
Abstract:
This paper presents a data processing pipeline designed to extract information from the hyperspectral signature of unknown space objects. The methodology proposed in this paper determines the material composition of space objects from single pixel images. Two techniques are used for material identification and classification: one based on machine learning and the other based on a least square matc…
▽ More
This paper presents a data processing pipeline designed to extract information from the hyperspectral signature of unknown space objects. The methodology proposed in this paper determines the material composition of space objects from single pixel images. Two techniques are used for material identification and classification: one based on machine learning and the other based on a least square match with a library of known spectra. From this information, a supervised machine learning algorithm is used to classify the object into one of several categories based on the detection of materials on the object. The behaviour of the material classification methods is investigated under non-ideal circumstances, to determine the effect of weathered materials, and the behaviour when the training library is missing a material that is present in the object being observed. Finally the paper will present some preliminary results on the identification and classification of space objects.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Sig-Splines: universal approximation and convex calibration of time series generative models
Authors:
Magnus Wiese,
Phillip Murray,
Ralf Korn
Abstract:
We propose a novel generative model for multivariate discrete-time time series data. Drawing inspiration from the construction of neural spline flows, our algorithm incorporates linear transformations and the signature transform as a seamless substitution for traditional neural networks. This approach enables us to achieve not only the universality property inherent in neural networks but also int…
▽ More
We propose a novel generative model for multivariate discrete-time time series data. Drawing inspiration from the construction of neural spline flows, our algorithm incorporates linear transformations and the signature transform as a seamless substitution for traditional neural networks. This approach enables us to achieve not only the universality property inherent in neural networks but also introduces convexity in the model's parameters.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
ImPaKT: A Dataset for Open-Schema Knowledge Base Construction
Authors:
Luke Vilnis,
Zach Fisher,
Bhargav Kanagal,
Patrick Murray,
Sumit Sanghai
Abstract:
Large language models have ushered in a golden age of semantic parsing. The seq2seq paradigm allows for open-schema and abstractive attribute and relation extraction given only small amounts of finetuning data. Language model pretraining has simultaneously enabled great strides in natural language inference, reasoning about entailment and implication in free text. These advances motivate us to con…
▽ More
Large language models have ushered in a golden age of semantic parsing. The seq2seq paradigm allows for open-schema and abstractive attribute and relation extraction given only small amounts of finetuning data. Language model pretraining has simultaneously enabled great strides in natural language inference, reasoning about entailment and implication in free text. These advances motivate us to construct ImPaKT, a dataset for open-schema information extraction, consisting of around 2500 text snippets from the C4 corpus, in the shop** domain (product buying guides), professionally annotated with extracted attributes, types, attribute summaries (attribute schema discovery from idiosyncratic text), many-to-one relations between compound and atomic attributes, and implication relations. We release this data in hope that it will be useful in fine tuning semantic parsers for information extraction and knowledge base construction across a variety of domains. We evaluate the power of this approach by fine-tuning the open source UL2 language model on a subset of the dataset, extracting a set of implication relations from a corpus of product buying guides, and conducting human evaluations of the resulting predictions.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Authors:
Luke Vilnis,
Yury Zemlyanskiy,
Patrick Murray,
Alexandre Passos,
Sumit Sanghai
Abstract:
Decoding methods for large language models often trade-off between diversity of outputs and parallelism of computation. Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. Alternatively, methods such as temperature sampling and its modifications (top-k sampling, nucleus sampling, typical decoding, and…
▽ More
Decoding methods for large language models often trade-off between diversity of outputs and parallelism of computation. Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. Alternatively, methods such as temperature sampling and its modifications (top-k sampling, nucleus sampling, typical decoding, and others), are embarrassingly parallel, but have no guarantees about duplicate samples. We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model, compatible with common sampling variations, with provable beam diversity under certain conditions, as well as being embarrassingly parallel and providing unbiased and consistent expectations from the original model. We demonstrate the effectiveness of our approach on WMT machine translation, more than halving the standard deviation when estimating expected BLEU score reward, and closing the BLEU score gap between independent sampling and beam search by up to 63%.
△ Less
Submitted 1 June, 2023; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Challenges in implementing DDR3 memory interface on PCB systems: a methodology for interfacing DDR3 SDRAM DIMM to an FPGA
Authors:
Phil Murray,
Feras Al-Hawari
Abstract:
Undoubtedly faster, larger and lower power per bit, but just how do you go about interfacing a DDR3 SDRAM DIMM to an FPGA? The DDR3 standard addresses the faster, more bandwidth and lower power per bit need, but it introduces new design challenges in addition to challenges introduced by DDR2 ODT, slew rate derating, etc. The DDR3 fly-by topology requirement means customers designing DDR3 memories…
▽ More
Undoubtedly faster, larger and lower power per bit, but just how do you go about interfacing a DDR3 SDRAM DIMM to an FPGA? The DDR3 standard addresses the faster, more bandwidth and lower power per bit need, but it introduces new design challenges in addition to challenges introduced by DDR2 ODT, slew rate derating, etc. The DDR3 fly-by topology requirement means customers designing DDR3 memories must now account for write leveling and read de-skew on the PCB. This paper will cover modeling, simulation, and physical layout approaches required to meet JEDEC-defined termination and tight timing requirements for designing DDR3 memory interfaces on PCB systems.
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
Risk-Neutral Market Simulation
Authors:
Magnus Wiese,
Phillip Murray
Abstract:
We develop a risk-neutral spot and equity option market simulator for a single underlying, under which the joint market process is a martingale. We leverage an efficient low-dimensional representation of the market which preserves no static arbitrage, and employ neural spline flows to simulate samples which are free from conditional drifts and are highly realistic in the sense that among all possi…
▽ More
We develop a risk-neutral spot and equity option market simulator for a single underlying, under which the joint market process is a martingale. We leverage an efficient low-dimensional representation of the market which preserves no static arbitrage, and employ neural spline flows to simulate samples which are free from conditional drifts and are highly realistic in the sense that among all possible risk-neutral simulators, the obtained risk-neutral simulator is the closest to the historical data with respect to the Kullback-Leibler divergence. Numerical experiments demonstrate the effectiveness and highlight both drift removal and fidelity of the calibrated simulator.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
Multi-Asset Spot and Option Market Simulation
Authors:
Magnus Wiese,
Ben Wood,
Alexandre Pachoud,
Ralf Korn,
Hans Buehler,
Phillip Murray,
Lianjun Bai
Abstract:
We construct realistic spot and equity option market simulators for a single underlying on the basis of normalizing flows. We address the high-dimensionality of market observed call prices through an arbitrage-free autoencoder that approximates efficient low-dimensional representations of the prices while maintaining no static arbitrage in the reconstructed surface. Given a multi-asset universe, w…
▽ More
We construct realistic spot and equity option market simulators for a single underlying on the basis of normalizing flows. We address the high-dimensionality of market observed call prices through an arbitrage-free autoencoder that approximates efficient low-dimensional representations of the prices while maintaining no static arbitrage in the reconstructed surface. Given a multi-asset universe, we leverage the conditional invertibility property of normalizing flows and introduce a scalable method to calibrate the joint distribution of a set of independent simulators while preserving the dynamics of each simulator. Empirical results highlight the goodness of the calibrated simulators and their fidelity.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.