-
Enhancing Deep Knowledge Tracing via Diffusion Models for Personalized Adaptive Learning
Authors:
Ming Kuo,
Shouvon Sarker,
Lijun Qian,
Yujian Fu,
Xiangfang Li,
Xishuang Dong
Abstract:
In contrast to pedagogies like evidence-based teaching, personalized adaptive learning (PAL) distinguishes itself by closely monitoring the progress of individual students and tailoring the learning path to their unique knowledge and requirements. A crucial technique for effective PAL implementation is knowledge tracing, which models students' evolving knowledge to predict their future performance…
▽ More
In contrast to pedagogies like evidence-based teaching, personalized adaptive learning (PAL) distinguishes itself by closely monitoring the progress of individual students and tailoring the learning path to their unique knowledge and requirements. A crucial technique for effective PAL implementation is knowledge tracing, which models students' evolving knowledge to predict their future performance. Based on these predictions, personalized recommendations for resources and learning paths can be made to meet individual needs. Recent advancements in deep learning have successfully enhanced knowledge tracking through Deep Knowledge Tracing (DKT). This paper introduces generative AI models to further enhance DKT. Generative AI models, rooted in deep learning, are trained to generate synthetic data, addressing data scarcity challenges in various applications across fields such as natural language processing (NLP) and computer vision (CV). This study aims to tackle data shortage issues in student learning records to enhance DKT performance for PAL. Specifically, it employs TabDDPM, a diffusion model, to generate synthetic educational records to augment training data for enhancing DKT. The proposed method's effectiveness is validated through extensive experiments on ASSISTments datasets. The experimental results demonstrate that the AI-generated data by TabDDPM significantly improves DKT performance, particularly in scenarios with small data for training and large data for testing.
△ Less
Submitted 24 April, 2024;
originally announced May 2024.
-
Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models
Authors:
**gyang Zhang,
**gwei Sun,
Eric Yeats,
Yang Ouyang,
Martin Kuo,
Jianyi Zhang,
Hao Frank Yang,
Hai Li
Abstract:
The problem of pre-training data detection for large language models (LLMs) has received growing attention due to its implications in critical issues like copyright violation and test data contamination. Despite improved performance, existing methods (including the state-of-the-art, Min-K%) are mostly developed upon simple heuristics and lack solid, reasonable foundations. In this work, we propose…
▽ More
The problem of pre-training data detection for large language models (LLMs) has received growing attention due to its implications in critical issues like copyright violation and test data contamination. Despite improved performance, existing methods (including the state-of-the-art, Min-K%) are mostly developed upon simple heuristics and lack solid, reasonable foundations. In this work, we propose a novel and theoretically motivated methodology for pre-training data detection, named Min-K%++. Specifically, we present a key insight that training samples tend to be local maxima of the modeled distribution along each input dimension through maximum likelihood training, which in turn allow us to insightfully translate the problem into identification of local maxima. Then, we design our method accordingly that works under the discrete distribution modeled by LLMs, whose core idea is to determine whether the input forms a mode or has relatively high probability under the conditional categorical distribution. Empirically, the proposed method achieves new SOTA performance across multiple settings. On the WikiMIA benchmark, Min-K%++ outperforms the runner-up by 6.2% to 10.5% in detection AUROC averaged over five models. On the more challenging MIMIR benchmark, it consistently improves upon reference-free methods while performing on par with reference-based method that requires an extra reference model.
△ Less
Submitted 23 May, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining
Authors:
Martin Kuo,
Jianyi Zhang,
Yiran Chen
Abstract:
Building on the cost-efficient pretraining advancements brought about by Crammed BERT, we enhance its performance and interpretability further by introducing a novel pretrained model Dependency Agreement Crammed BERT (DACBERT) and its two-stage pretraining framework - Dependency Agreement Pretraining. This framework, grounded by linguistic theories, seamlessly weaves syntax and semantic informatio…
▽ More
Building on the cost-efficient pretraining advancements brought about by Crammed BERT, we enhance its performance and interpretability further by introducing a novel pretrained model Dependency Agreement Crammed BERT (DACBERT) and its two-stage pretraining framework - Dependency Agreement Pretraining. This framework, grounded by linguistic theories, seamlessly weaves syntax and semantic information into the pretraining process. The first stage employs four dedicated submodels to capture representative dependency agreements at the chunk level, effectively converting these agreements into embeddings. The second stage uses these refined embeddings, in tandem with conventional BERT embeddings, to guide the pretraining of the rest of the model. Evaluated on the GLUE benchmark, our DACBERT demonstrates notable improvement across various tasks, surpassing Crammed BERT by 3.13% in the RTE task and by 2.26% in the MRPC task. Furthermore, our method boosts the average GLUE score by 0.83%, underscoring its significant potential. The pretraining process can be efficiently executed on a single GPU within a 24-hour cycle, necessitating no supplementary computational resources or extending the pretraining duration compared with the Crammed BERT. Extensive studies further illuminate our approach's instrumental role in bolstering the interpretability of pretrained language models for natural language understanding tasks.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Large Language Models on the Chessboard: A Study on ChatGPT's Formal Language Comprehension and Complex Reasoning Skills
Authors:
Mu-Tien Kuo,
Chih-Chung Hsueh,
Richard Tzong-Han Tsai
Abstract:
While large language models have made strides in natural language processing, their proficiency in complex reasoning tasks requiring formal language comprehension, such as chess, remains less investigated. This paper probes the performance of ChatGPT, a sophisticated language model by OpenAI in tackling such complex reasoning tasks, using chess as a case study. Through robust metrics examining bot…
▽ More
While large language models have made strides in natural language processing, their proficiency in complex reasoning tasks requiring formal language comprehension, such as chess, remains less investigated. This paper probes the performance of ChatGPT, a sophisticated language model by OpenAI in tackling such complex reasoning tasks, using chess as a case study. Through robust metrics examining both the legality and quality of moves, we assess ChatGPT's understanding of the chessboard, adherence to chess rules, and strategic decision-making abilities. Our evaluation identifies limitations within ChatGPT's attention mechanism that affect its formal language comprehension and uncovers the model's underdeveloped self-regulation abilities. Our study also reveals ChatGPT's propensity for a coherent strategy in its gameplay and a noticeable uptick in decision-making assertiveness when the model is presented with a greater volume of natural language or possesses a more lucid understanding of the state of the chessboard. These findings contribute to the growing exploration of language models' abilities beyond natural language processing, providing valuable information for future research towards models demonstrating human-like cognitive abilities.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Towards Building the Federated GPT: Federated Instruction Tuning
Authors:
Jianyi Zhang,
Saeed Vahidian,
Martin Kuo,
Chunyuan Li,
Ruiyi Zhang,
Tong Yu,
Yufan Zhou,
Guoyin Wang,
Yiran Chen
Abstract:
While "instruction-tuned" generative large language models (LLMs) have demonstrated an impressive ability to generalize to new tasks, the training phases heavily rely on large amounts of diverse and high-quality instruction data (such as ChatGPT and GPT-4). Unfortunately, acquiring high-quality data, especially when it comes to human-written data, can pose significant challenges both in terms of c…
▽ More
While "instruction-tuned" generative large language models (LLMs) have demonstrated an impressive ability to generalize to new tasks, the training phases heavily rely on large amounts of diverse and high-quality instruction data (such as ChatGPT and GPT-4). Unfortunately, acquiring high-quality data, especially when it comes to human-written data, can pose significant challenges both in terms of cost and accessibility. Moreover, concerns related to privacy can further limit access to such data, making the process of obtaining it a complex and nuanced undertaking. Consequently, this hinders the generality of the tuned models and may restrict their effectiveness in certain contexts. To tackle this issue, our study introduces a new approach called Federated Instruction Tuning (FedIT), which leverages federated learning (FL) as the learning framework for the instruction tuning of LLMs. This marks the first exploration of FL-based instruction tuning for LLMs. This is especially important since text data is predominantly generated by end users. Therefore, it is imperative to design and adapt FL approaches to effectively leverage these users' diverse instructions stored on local devices, while preserving privacy and ensuring data security. In the current paper, by conducting widely used GPT-4 auto-evaluation, we demonstrate that by exploiting the heterogeneous and diverse sets of instructions on the client's end with the proposed framework FedIT, we improved the performance of LLMs compared to centralized training with only limited local instructions. Further, in this paper, we developed a Github repository named Shepherd. This repository offers a foundational framework for exploring federated fine-tuning of LLMs using heterogeneous instructions across diverse categories.
△ Less
Submitted 29 January, 2024; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Adaptive Contrast for Image Regression in Computer-Aided Disease Assessment
Authors:
Weihang Dai,
Xiaomeng Li,
Wan Hang Keith Chiu,
Michael D. Kuo,
Kwang-Ting Cheng
Abstract:
Image regression tasks for medical applications, such as bone mineral density (BMD) estimation and left-ventricular ejection fraction (LVEF) prediction, play an important role in computer-aided disease assessment. Most deep regression methods train the neural network with a single regression loss function like MSE or L1 loss. In this paper, we propose the first contrastive learning framework for d…
▽ More
Image regression tasks for medical applications, such as bone mineral density (BMD) estimation and left-ventricular ejection fraction (LVEF) prediction, play an important role in computer-aided disease assessment. Most deep regression methods train the neural network with a single regression loss function like MSE or L1 loss. In this paper, we propose the first contrastive learning framework for deep image regression, namely AdaCon, which consists of a feature learning branch via a novel adaptive-margin contrastive loss and a regression prediction branch. Our method incorporates label distance relationships as part of the learned feature representations, which allows for better performance in downstream regression tasks. Moreover, it can be used as a plug-and-play module to improve performance of existing regression methods. We demonstrate the effectiveness of AdaCon on two medical image regression tasks, ie, bone mineral density estimation from X-ray images and left-ventricular ejection fraction prediction from echocardiogram videos. AdaCon leads to relative improvements of 3.3% and 5.9% in MAE over state-of-the-art BMD estimation and LVEF prediction methods, respectively.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
Secure Links: Secure-by-Design Communications in IEC 61499 Industrial Control Applications
Authors:
Awais Tanveer,
Roopak Sinha,
Matthew M. Y. Kuo
Abstract:
Increasing automation and external connectivity in industrial control systems (ICS) demand a greater emphasis on software-level communication security. In this article, we propose a secure-by-design development method for building ICS applications, where requirements from security standards like ISA/IEC 62443 are fulfilled by design-time abstractions called secure links. Proposed as an extension t…
▽ More
Increasing automation and external connectivity in industrial control systems (ICS) demand a greater emphasis on software-level communication security. In this article, we propose a secure-by-design development method for building ICS applications, where requirements from security standards like ISA/IEC 62443 are fulfilled by design-time abstractions called secure links. Proposed as an extension to the IEC 61499 development standard, secure links incorporate both light-weight and traditional security mechanisms into applications with negligible effort. Applications containing secure links can be automatically compiled into fully IEC 61499-compliant software. Experimental results show secure links significantly reduce design and code complexity and improve application maintainability and requirements traceability.
△ Less
Submitted 24 July, 2021;
originally announced July 2021.
-
Tag and Correct: Question aware Open Information Extraction with Two-stage Decoding
Authors:
Martin Kuo,
Yaobo Liang,
Lei Ji,
Nan Duan,
Linjun Shou,
Ming Gong,
Peng Chen
Abstract:
Question Aware Open Information Extraction (Question aware Open IE) takes question and passage as inputs, outputting an answer tuple which contains a subject, a predicate, and one or more arguments. Each field of answer is a natural language word sequence and is extracted from the passage. The semi-structured answer has two advantages which are more readable and falsifiable compared to span answer…
▽ More
Question Aware Open Information Extraction (Question aware Open IE) takes question and passage as inputs, outputting an answer tuple which contains a subject, a predicate, and one or more arguments. Each field of answer is a natural language word sequence and is extracted from the passage. The semi-structured answer has two advantages which are more readable and falsifiable compared to span answer. There are two approaches to solve this problem. One is an extractive method which extracts candidate answers from the passage with the Open IE model, and ranks them by matching with questions. It fully uses the passage information at the extraction step, but the extraction is independent to the question. The other one is the generative method which uses a sequence to sequence model to generate answers directly. It combines the question and passage as input at the same time, but it generates the answer from scratch, which does not use the facts that most of the answer words come from in the passage. To guide the generation by passage, we present a two-stage decoding model which contains a tagging decoder and a correction decoder. At the first stage, the tagging decoder will tag keywords from the passage. At the second stage, the correction decoder will generate answers based on tagged keywords. Our model could be trained end-to-end although it has two stages. Compared to previous generative models, we generate better answers by generating coarse to fine. We evaluate our model on WebAssertions (Yan et al., 2018) which is a Question aware Open IE dataset. Our model achieves a BLEU score of 59.32, which is better than previous generative methods.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
A Variational Approach to Unsupervised Sentiment Analysis
Authors:
Ziqian Zeng,
Wenxuan Zhou,
Xin Liu,
Zizheng Lin,
Yangqin Song,
Michael David Kuo,
Wan Hang Keith Chiu
Abstract:
In this paper, we propose a variational approach to unsupervised sentiment analysis. Instead of using ground truth provided by domain experts, we use target-opinion word pairs as a supervision signal. For example, in a document snippet "the room is big," (room, big) is a target-opinion word pair. These word pairs can be extracted by using dependency parsers and simple rules. Our objective function…
▽ More
In this paper, we propose a variational approach to unsupervised sentiment analysis. Instead of using ground truth provided by domain experts, we use target-opinion word pairs as a supervision signal. For example, in a document snippet "the room is big," (room, big) is a target-opinion word pair. These word pairs can be extracted by using dependency parsers and simple rules. Our objective function is to predict an opinion word given a target word while our ultimate goal is to learn a sentiment classifier. By introducing a latent variable, i.e., the sentiment polarity, to the objective function, we can inject the sentiment classifier to the objective function via the evidence lower bound. We can learn a sentiment classifier by optimizing the lower bound. We also impose sophisticated constraints on opinion words as regularization which encourages that if two documents have similar (dissimilar) opinion words, the sentiment classifiers should produce similar (different) probability distribution. We apply our method to sentiment analysis on customer reviews and clinical narratives. The experiment results show our method can outperform unsupervised baselines in sentiment analysis task on both domains, and our method obtains comparable results to the supervised method with hundreds of labels per aspect in customer reviews domain, and obtains comparable results to supervised methods in clinical narratives domain.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
Measuring group-separability in geometrical space for evaluation of pattern recognition and embedding algorithms
Authors:
A. Acevedo,
S. Ciucci,
MJ. Kuo,
C. Duran,
CV. Cannistraci
Abstract:
Evaluating data separation in a geometrical space is fundamental for pattern recognition. A plethora of dimensionality reduction (DR) algorithms have been developed in order to reveal the emergence of geometrical patterns in a low dimensional visible representation space, in which high-dimensional samples similarities are approximated by geometrical distances. However, statistical measures to eval…
▽ More
Evaluating data separation in a geometrical space is fundamental for pattern recognition. A plethora of dimensionality reduction (DR) algorithms have been developed in order to reveal the emergence of geometrical patterns in a low dimensional visible representation space, in which high-dimensional samples similarities are approximated by geometrical distances. However, statistical measures to evaluate directly in the low dimensional geometrical space the sample group separability attaiend by these DR algorithms are missing. Certainly, these separability measures could be used both to compare algorithms performance and to tune algorithms parameters. Here, we propose three statistical measures (named as PSI-ROC, PSI-PR, and PSI-P) that have origin from the Projection Separability (PS) rationale introduced in this study, which is expressly designed to assess group separability of data samples in a geometrical space. Traditional cluster validity indices (CVIs) might be applied in this context but they show limitations because they are not specifically tailored for DR. Our PS measures are compared to six baseline cluster validity indices, using five non-linear datasets and six different DR algorithms. The results provide clear evidence that statistical-based measures based on PS rationale are more accurate than CVIs and can be adopted to control the tuning of parameter-dependent DR algorithms.
△ Less
Submitted 28 December, 2019;
originally announced December 2019.
-
Appearance and Shape from Water Reflection
Authors:
Ryo Kawahara,
Meng-Yu Jennifer Kuo,
Shohei Nobuhara,
Ko Nishino
Abstract:
This paper introduces single-image geometric and appearance reconstruction from water reflection photography, i.e., images capturing direct and water-reflected real-world scenes. Water reflection offers an additional viewpoint to the direct sight, collectively forming a stereo pair. The water-reflected scene, however, includes internally scattered and reflected environmental illumination in additi…
▽ More
This paper introduces single-image geometric and appearance reconstruction from water reflection photography, i.e., images capturing direct and water-reflected real-world scenes. Water reflection offers an additional viewpoint to the direct sight, collectively forming a stereo pair. The water-reflected scene, however, includes internally scattered and reflected environmental illumination in addition to the scene radiance, which precludes direct stereo matching. We derive a principled iterative method that disentangles this scene radiometry and geometry for reconstructing 3D scene structure as well as its high-dynamic range appearance. In the presence of waves, we simultaneously recover the wave geometry as surface normal perturbations of the water surface. Most important, we show that the water reflection enables calibration of the camera. In other words, for the first time, we show that capturing a direct and water-reflected scene in a single exposure forms a self-calibrating HDR catadioptric stereo camera. We demonstrate our method on a number of images taken in the wild. The results demonstrate a new means for leveraging this accidental catadioptric camera.
△ Less
Submitted 7 January, 2020; v1 submitted 24 June, 2019;
originally announced June 2019.
-
Maximum-Likelihood Priority-First Search Decodable Codes for Combined Channel Estimation and Error Protection
Authors:
Chia-Lung Wu,
Po-Ning Chen,
Yunghsiang S. Han,
Ming-Hsin Kuo
Abstract:
The code that combines channel estimation and error protection has received general attention recently, and has been considered a promising methodology to compensate multi-path fading effect. It has been shown by simulations that such code design can considerably improve the system performance over the conventional design with separate channel estimation and error protection modules under the sa…
▽ More
The code that combines channel estimation and error protection has received general attention recently, and has been considered a promising methodology to compensate multi-path fading effect. It has been shown by simulations that such code design can considerably improve the system performance over the conventional design with separate channel estimation and error protection modules under the same code rate. Nevertheless, the major obstacle that prevents from the practice of the codes is that the existing codes are mostly searched by computers, and hence exhibit no good structure for efficient decoding. Hence, the time-consuming exhaustive search becomes the only decoding choice, and the decoding complexity increases dramatically with the codeword length. In this paper, by optimizing the signal-tonoise ratio, we found a systematic construction for the codes for combined channel estimation and error protection, and confirmed its equivalence in performance to the computer-searched codes by simulations. Moreover, the structural codes that we construct by rules can now be maximum-likelihoodly decodable in terms of a newly derived recursive metric for use of the priority-first search decoding algorithm. Thus,the decoding complexity reduces significantly when compared with that of the exhaustive decoder. The extension code design for fast-fading channels is also presented. Simulations conclude that our constructed extension code is robust in performance even if the coherent period is shorter than the codeword length.
△ Less
Submitted 17 December, 2007;
originally announced December 2007.