-
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Authors:
Cheng-Yu Hsieh,
Yung-Sung Chuang,
Chun-Liang Li,
Zifeng Wang,
Long T. Le,
Abhishek Kumar,
James Glass,
Alexander Ratner,
Chen-Yu Lee,
Ranjay Krishna,
Tomas Pfister
Abstract:
Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between…
▽ More
Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit a U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better performance in locating relevant information within a long context, but also eventually leads to improved retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points. These findings open up future directions in understanding LLM attention bias and its potential consequences.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation
Authors:
I-Hung Hsu,
Zifeng Wang,
Long T. Le,
Lesly Miculicich,
Nanyun Peng,
Chen-Yu Lee,
Tomas Pfister
Abstract:
Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses by accurately citing verifiable sources. However, existing methods, by either feeding LMs with raw or preprocessed materials, remain prone to errors. To address this, we introduce CaLM, a novel verification framework. CaLM leverages the insight that a robust grounded response…
▽ More
Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses by accurately citing verifiable sources. However, existing methods, by either feeding LMs with raw or preprocessed materials, remain prone to errors. To address this, we introduce CaLM, a novel verification framework. CaLM leverages the insight that a robust grounded response should be consistent with information derived solely from its cited sources. Our framework empowers smaller LMs, which rely less on parametric memory and excel at processing relevant information given a query, to validate the output of larger LMs. Larger LM responses that closely align with the smaller LMs' output, which relies exclusively on cited documents, are verified. Responses showing discrepancies are iteratively refined through a feedback loop. Experiments on three open-domain question-answering datasets demonstrate significant performance gains of 1.5% to 7% absolute average without any required model fine-tuning.
△ Less
Submitted 24 June, 2024; v1 submitted 8 June, 2024;
originally announced June 2024.
-
$i$REPO: $i$mplicit Reward Pairwise Difference based Empirical Preference Optimization
Authors:
Long Tan Le,
Han Shu,
Tung-Anh Nguyen,
Choong Seon Hong,
Nguyen H. Tran
Abstract:
While astonishingly capable, large Language Models (LLM) can sometimes produce outputs that deviate from human expectations. Such deviations necessitate an alignment phase to prevent disseminating untruthful, toxic, or biased information. Traditional alignment methods based on reinforcement learning often struggle with the identified instability, whereas preference optimization methods are limited…
▽ More
While astonishingly capable, large Language Models (LLM) can sometimes produce outputs that deviate from human expectations. Such deviations necessitate an alignment phase to prevent disseminating untruthful, toxic, or biased information. Traditional alignment methods based on reinforcement learning often struggle with the identified instability, whereas preference optimization methods are limited by their overfitting to pre-collected hard-label datasets. In this paper, we propose a novel LLM alignment framework named $i$REPO, which utilizes implicit Reward pairwise difference regression for Empirical Preference Optimization. Particularly, $i$REPO employs self-generated datasets labelled by empirical human (or AI annotator) preference to iteratively refine the aligned policy through a novel regression-based loss function. Furthermore, we introduce an innovative algorithm backed by theoretical guarantees for achieving optimal results under ideal assumptions and providing a practical performance-gap result without such assumptions. Experimental results with Phi-2 and Mistral-7B demonstrate that $i$REPO effectively achieves self-alignment using soft-label, self-generated responses and the logit of empirical AI annotators. Furthermore, our approach surpasses preference optimization baselines in evaluations using the Language Model Evaluation Harness and Multi-turn benchmarks.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
CodecLM: Aligning Language Models with Tailored Synthetic Data
Authors:
Zifeng Wang,
Chun-Liang Li,
Vincent Perot,
Long T. Le,
** Miao,
Zizhao Zhang,
Chen-Yu Lee,
Tomas Pfister
Abstract:
Instruction tuning has emerged as the key in aligning large language models (LLMs) with specific task instructions, thereby mitigating the discrepancy between the next-token prediction objective and users' actual goals. To reduce the labor and time cost to collect or annotate data by humans, researchers start to explore the use of LLMs to generate instruction-aligned synthetic data. Recent works f…
▽ More
Instruction tuning has emerged as the key in aligning large language models (LLMs) with specific task instructions, thereby mitigating the discrepancy between the next-token prediction objective and users' actual goals. To reduce the labor and time cost to collect or annotate data by humans, researchers start to explore the use of LLMs to generate instruction-aligned synthetic data. Recent works focus on generating diverse instructions and applying LLM to increase instruction complexity, often neglecting downstream use cases. It remains unclear how to tailor high-quality data to elicit better instruction-following abilities in different target instruction distributions and LLMs. To this end, we introduce CodecLM, a general framework for adaptively generating high-quality synthetic data for LLM alignment with different downstream instruction distributions and LLMs. Drawing on the Encode-Decode principles, we use LLMs as codecs to guide the data generation process. We first encode seed instructions into metadata, which are concise keywords generated on-the-fly to capture the target instruction distribution, and then decode metadata to create tailored instructions. We also introduce Self-Rubrics and Contrastive Filtering during decoding to tailor data-efficient samples. Extensive experiments on four open-domain instruction following benchmarks validate the effectiveness of CodecLM over the current state-of-the-arts.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Federated Deep Equilibrium Learning: A Compact Shared Representation for Edge Communication Efficiency
Authors:
Long Tan Le,
Tuan Dung Nguyen,
Tung-Anh Nguyen,
Choong Seon Hong,
Nguyen H. Tran
Abstract:
Federated Learning (FL) is a prominent distributed learning paradigm facilitating collaboration among nodes within an edge network to co-train a global model without centralizing data. By shifting computation to the network edge, FL offers robust and responsive edge-AI solutions and enhance privacy-preservation. However, deploying deep FL models within edge environments is often hindered by commun…
▽ More
Federated Learning (FL) is a prominent distributed learning paradigm facilitating collaboration among nodes within an edge network to co-train a global model without centralizing data. By shifting computation to the network edge, FL offers robust and responsive edge-AI solutions and enhance privacy-preservation. However, deploying deep FL models within edge environments is often hindered by communication bottlenecks, data heterogeneity, and memory limitations. To address these challenges jointly, we introduce FeDEQ, a pioneering FL framework that effectively employs deep equilibrium learning and consensus optimization to exploit a compact shared data representation across edge nodes, allowing the derivation of personalized models specific to each node. We delve into a unique model structure composed of an equilibrium layer followed by traditional neural network layers. Here, the equilibrium layer functions as a global feature representation that edge nodes can adapt to personalize their local layers. Capitalizing on FeDEQ's compactness and representation power, we present a novel distributed algorithm rooted in the alternating direction method of multipliers (ADMM) consensus optimization and theoretically establish its convergence for smooth objectives. Experiments across various benchmarks demonstrate that FeDEQ achieves performance comparable to state-of-the-art personalized methods while employing models of up to 4 times smaller in communication size and 1.5 times lower memory footprint during training.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Evaluating the impact of an explainable machine learning system on the interobserver agreement in chest radiograph interpretation
Authors:
Hieu H. Pham,
Ha Q. Nguyen,
Hieu T. Nguyen,
Linh T. Le,
Khanh Lam
Abstract:
We conducted a prospective study to measure the clinical impact of an explainable machine learning system on interobserver agreement in chest radiograph interpretation. The AI system, which we call as it VinDr-CXR when used as a diagnosis-supporting tool, significantly improved the agreement between six radiologists with an increase of 1.5% in mean Fleiss' Kappa. In addition, we also observed that…
▽ More
We conducted a prospective study to measure the clinical impact of an explainable machine learning system on interobserver agreement in chest radiograph interpretation. The AI system, which we call as it VinDr-CXR when used as a diagnosis-supporting tool, significantly improved the agreement between six radiologists with an increase of 1.5% in mean Fleiss' Kappa. In addition, we also observed that, after the radiologists consulted AI's suggestions, the agreement between each radiologist and the system was remarkably increased by 3.3% in mean Cohen's Kappa. This work has been accepted for publication in IEEE Access and this paper is our short version submitted to the Midwest Machine Learning Symposium (MMLS 2023), Chicago, IL, USA.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Federated PCA on Grassmann Manifold for Anomaly Detection in IoT Networks
Authors:
Tung-Anh Nguyen,
Jiayu He,
Long Tan Le,
Wei Bao,
Nguyen H. Tran
Abstract:
In the era of Internet of Things (IoT), network-wide anomaly detection is a crucial part of monitoring IoT networks due to the inherent security vulnerabilities of most IoT devices. Principal Components Analysis (PCA) has been proposed to separate network traffics into two disjoint subspaces corresponding to normal and malicious behaviors for anomaly detection. However, the privacy concerns and li…
▽ More
In the era of Internet of Things (IoT), network-wide anomaly detection is a crucial part of monitoring IoT networks due to the inherent security vulnerabilities of most IoT devices. Principal Components Analysis (PCA) has been proposed to separate network traffics into two disjoint subspaces corresponding to normal and malicious behaviors for anomaly detection. However, the privacy concerns and limitations of devices' computing resources compromise the practical effectiveness of PCA. We propose a federated PCA-based Grassmannian optimization framework that coordinates IoT devices to aggregate a joint profile of normal network behaviors for anomaly detection. First, we introduce a privacy-preserving federated PCA framework to simultaneously capture the profile of various IoT devices' traffic. Then, we investigate the alternating direction method of multipliers gradient-based learning on the Grassmann manifold to guarantee fast training and the absence of detecting latency using limited computational resources. Empirical results on the NSL-KDD dataset demonstrate that our method outperforms baseline approaches. Finally, we show that the Grassmann manifold algorithm is highly adapted for IoT anomaly detection, which permits drastically reducing the analysis time of the system. To the best of our knowledge, this is the first federated PCA algorithm for anomaly detection meeting the requirements of IoT networks.
△ Less
Submitted 10 January, 2023; v1 submitted 22 December, 2022;
originally announced December 2022.
-
An Accurate and Explainable Deep Learning System Improves Interobserver Agreement in the Interpretation of Chest Radiograph
Authors:
Hieu H. Pham,
Ha Q. Nguyen,
Hieu T. Nguyen,
Linh T. Le,
Lam Khanh
Abstract:
Recent artificial intelligence (AI) algorithms have achieved radiologist-level performance on various medical classification tasks. However, only a few studies addressed the localization of abnormal findings from CXR scans, which is essential in explaining the image-level classification to radiologists. We introduce in this paper an explainable deep learning system called VinDr-CXR that can classi…
▽ More
Recent artificial intelligence (AI) algorithms have achieved radiologist-level performance on various medical classification tasks. However, only a few studies addressed the localization of abnormal findings from CXR scans, which is essential in explaining the image-level classification to radiologists. We introduce in this paper an explainable deep learning system called VinDr-CXR that can classify a CXR scan into multiple thoracic diseases and, at the same time, localize most types of critical findings on the image. VinDr-CXR was trained on 51,485 CXR scans with radiologist-provided bounding box annotations. It demonstrated a comparable performance to experienced radiologists in classifying 6 common thoracic diseases on a retrospective validation set of 3,000 CXR scans, with a mean area under the receiver operating characteristic curve (AUROC) of 0.967 (95% confidence interval [CI]: 0.958-0.975). The VinDr-CXR was also externally validated in independent patient cohorts and showed its robustness. For the localization task with 14 types of lesions, our free-response receiver operating characteristic (FROC) analysis showed that the VinDr-CXR achieved a sensitivity of 80.2% at the rate of 1.0 false-positive lesion identified per scan. A prospective study was also conducted to measure the clinical impact of the VinDr-CXR in assisting six experienced radiologists. The results indicated that the proposed system, when used as a diagnosis supporting tool, significantly improved the agreement between radiologists themselves with an increase of 1.5% in mean Fleiss' Kappa. We also observed that, after the radiologists consulted VinDr-CXR's suggestions, the agreement between each of them and the system was remarkably increased by 3.3% in mean Cohen's Kappa.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.
-
On the Generalization of Wasserstein Robust Federated Learning
Authors:
Tung-Anh Nguyen,
Tuan Dung Nguyen,
Long Tan Le,
Canh T. Dinh,
Nguyen H. Tran
Abstract:
In federated learning, participating clients typically possess non-i.i.d. data, posing a significant challenge to generalization to unseen distributions. To address this, we propose a Wasserstein distributionally robust optimization scheme called WAFL. Leveraging its duality, we frame WAFL as an empirical surrogate risk minimization problem, and solve it using a local SGD-based algorithm with conv…
▽ More
In federated learning, participating clients typically possess non-i.i.d. data, posing a significant challenge to generalization to unseen distributions. To address this, we propose a Wasserstein distributionally robust optimization scheme called WAFL. Leveraging its duality, we frame WAFL as an empirical surrogate risk minimization problem, and solve it using a local SGD-based algorithm with convergence guarantees. We show that the robustness of WAFL is more general than related approaches, and the generalization bound is robust to all adversarial distributions inside the Wasserstein ball (ambiguity set). Since the center location and radius of the Wasserstein ball can be suitably modified, WAFL shows its applicability not only in robustness but also in domain adaptation. Through empirical evaluation, we demonstrate that WAFL generalizes better than the vanilla FedAvg in non-i.i.d. settings, and is more robust than other related methods in distribution shift settings. Further, using benchmark datasets we show that WAFL is capable of generalizing to unseen target domains.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography
Authors:
Hieu T. Nguyen,
Ha Q. Nguyen,
Hieu H. Pham,
Khanh Lam,
Linh T. Le,
Minh Dao,
Van Vu
Abstract:
Mammography, or breast X-ray, is the most widely used imaging modality to detect cancer and other breast diseases. Recent studies have shown that deep learning-based computer-assisted detection and diagnosis (CADe or CADx) tools have been developed to support physicians and improve the accuracy of interpreting mammography. However, most published datasets of mammography are either limited on sampl…
▽ More
Mammography, or breast X-ray, is the most widely used imaging modality to detect cancer and other breast diseases. Recent studies have shown that deep learning-based computer-assisted detection and diagnosis (CADe or CADx) tools have been developed to support physicians and improve the accuracy of interpreting mammography. However, most published datasets of mammography are either limited on sample size or digitalized from screen-film mammography (SFM), hindering the development of CADe and CADx tools which are developed based on full-field digital mammography (FFDM). To overcome this challenge, we introduce VinDr-Mammo - a new benchmark dataset of FFDM for detecting and diagnosing breast cancer and other diseases in mammography. The dataset consists of 5,000 mammography exams, each of which has four standard views and is double read with disagreement (if any) being resolved by arbitration. It is created for the assessment of Breast Imaging Reporting and Data System (BI-RADS) and density at the breast level. In addition, the dataset also provides the category, location, and BI-RADS assessment of non-benign findings. We make VinDr-Mammo publicly available on PhysioNet as a new imaging resource to promote advances in develo** CADe and CADx tools for breast cancer screening.
△ Less
Submitted 16 March, 2023; v1 submitted 20 March, 2022;
originally announced March 2022.
-
Interpretable Sequence Learning for COVID-19 Forecasting
Authors:
Sercan O. Arik,
Chun-Liang Li,
**sung Yoon,
Rajarishi Sinha,
Arkady Epshteyn,
Long T. Le,
Vikas Menon,
Shashank Singh,
Leyou Zhang,
Nate Yoder,
Martin Nikoltchev,
Yash Sonthalia,
Hootan Nakhost,
Elli Kanal,
Tomas Pfister
Abstract:
We propose a novel approach that integrates machine learning into compartmental disease modeling to predict the progression of COVID-19. Our model is explainable by design as it explicitly shows how different compartments evolve and it uses interpretable encoders to incorporate covariates and improve performance. Explainability is valuable to ensure that the model's forecasts are credible to epide…
▽ More
We propose a novel approach that integrates machine learning into compartmental disease modeling to predict the progression of COVID-19. Our model is explainable by design as it explicitly shows how different compartments evolve and it uses interpretable encoders to incorporate covariates and improve performance. Explainability is valuable to ensure that the model's forecasts are credible to epidemiologists and to instill confidence in end-users such as policy makers and healthcare institutions. Our model can be applied at different geographic resolutions, and here we demonstrate it for states and counties in the United States. We show that our model provides more accurate forecasts, in metrics averaged across the entire US, than state-of-the-art alternatives, and that it provides qualitatively meaningful explanatory insights. Lastly, we analyze the performance of our model for different subgroups based on the subgroup distributions within the counties.
△ Less
Submitted 13 January, 2021; v1 submitted 3 August, 2020;
originally announced August 2020.
-
Using Entity Relations for Opinion Mining of Vietnamese Comments
Authors:
P. T. Nguyen,
L. T. Le,
V. M. Ngo,
P. M. Nguyen
Abstract:
In this paper, we propose several novel techniques to extract and mining opinions of Vietnamese reviews of customers about a number of products traded on e-commerce in Vietnam. The assessment is based on the emotional level of customers on a specific product such as mobile and laptop. We exploit the features of the products because they are much interested by customers and have many products in th…
▽ More
In this paper, we propose several novel techniques to extract and mining opinions of Vietnamese reviews of customers about a number of products traded on e-commerce in Vietnam. The assessment is based on the emotional level of customers on a specific product such as mobile and laptop. We exploit the features of the products because they are much interested by customers and have many products in the Vietnam e-commerce market. Thence, it can be known the favorites and dislikes of customers about exploited products.
△ Less
Submitted 16 May, 2019;
originally announced May 2019.