Search | arXiv e-print repository

Enhancing Deep Knowledge Tracing via Diffusion Models for Personalized Adaptive Learning

Authors: Ming Kuo, Shouvon Sarker, Lijun Qian, Yujian Fu, Xiangfang Li, Xishuang Dong

Abstract: In contrast to pedagogies like evidence-based teaching, personalized adaptive learning (PAL) distinguishes itself by closely monitoring the progress of individual students and tailoring the learning path to their unique knowledge and requirements. A crucial technique for effective PAL implementation is knowledge tracing, which models students' evolving knowledge to predict their future performance… ▽ More In contrast to pedagogies like evidence-based teaching, personalized adaptive learning (PAL) distinguishes itself by closely monitoring the progress of individual students and tailoring the learning path to their unique knowledge and requirements. A crucial technique for effective PAL implementation is knowledge tracing, which models students' evolving knowledge to predict their future performance. Based on these predictions, personalized recommendations for resources and learning paths can be made to meet individual needs. Recent advancements in deep learning have successfully enhanced knowledge tracking through Deep Knowledge Tracing (DKT). This paper introduces generative AI models to further enhance DKT. Generative AI models, rooted in deep learning, are trained to generate synthetic data, addressing data scarcity challenges in various applications across fields such as natural language processing (NLP) and computer vision (CV). This study aims to tackle data shortage issues in student learning records to enhance DKT performance for PAL. Specifically, it employs TabDDPM, a diffusion model, to generate synthetic educational records to augment training data for enhancing DKT. The proposed method's effectiveness is validated through extensive experiments on ASSISTments datasets. The experimental results demonstrate that the AI-generated data by TabDDPM significantly improves DKT performance, particularly in scenarios with small data for training and large data for testing. △ Less

Submitted 24 April, 2024; originally announced May 2024.

arXiv:2404.02936 [pdf, other]

Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models

Authors: **gyang Zhang, **gwei Sun, Eric Yeats, Yang Ouyang, Martin Kuo, Jianyi Zhang, Hao Frank Yang, Hai Li

Abstract: The problem of pre-training data detection for large language models (LLMs) has received growing attention due to its implications in critical issues like copyright violation and test data contamination. Despite improved performance, existing methods (including the state-of-the-art, Min-K%) are mostly developed upon simple heuristics and lack solid, reasonable foundations. In this work, we propose… ▽ More The problem of pre-training data detection for large language models (LLMs) has received growing attention due to its implications in critical issues like copyright violation and test data contamination. Despite improved performance, existing methods (including the state-of-the-art, Min-K%) are mostly developed upon simple heuristics and lack solid, reasonable foundations. In this work, we propose a novel and theoretically motivated methodology for pre-training data detection, named Min-K%++. Specifically, we present a key insight that training samples tend to be local maxima of the modeled distribution along each input dimension through maximum likelihood training, which in turn allow us to insightfully translate the problem into identification of local maxima. Then, we design our method accordingly that works under the discrete distribution modeled by LLMs, whose core idea is to determine whether the input forms a mode or has relatively high probability under the conditional categorical distribution. Empirically, the proposed method achieves new SOTA performance across multiple settings. On the WikiMIA benchmark, Min-K%++ outperforms the runner-up by 6.2% to 10.5% in detection AUROC averaged over five models. On the more challenging MIMIR benchmark, it consistently improves upon reference-free methods while performing on par with reference-based method that requires an extra reference model. △ Less

Submitted 23 May, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: Project page and code is available at https://zjysteven.github.io/mink-plus-plus/

arXiv:2311.04799 [pdf, other]

DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining

Authors: Martin Kuo, Jianyi Zhang, Yiran Chen

Abstract: Building on the cost-efficient pretraining advancements brought about by Crammed BERT, we enhance its performance and interpretability further by introducing a novel pretrained model Dependency Agreement Crammed BERT (DACBERT) and its two-stage pretraining framework - Dependency Agreement Pretraining. This framework, grounded by linguistic theories, seamlessly weaves syntax and semantic informatio… ▽ More Building on the cost-efficient pretraining advancements brought about by Crammed BERT, we enhance its performance and interpretability further by introducing a novel pretrained model Dependency Agreement Crammed BERT (DACBERT) and its two-stage pretraining framework - Dependency Agreement Pretraining. This framework, grounded by linguistic theories, seamlessly weaves syntax and semantic information into the pretraining process. The first stage employs four dedicated submodels to capture representative dependency agreements at the chunk level, effectively converting these agreements into embeddings. The second stage uses these refined embeddings, in tandem with conventional BERT embeddings, to guide the pretraining of the rest of the model. Evaluated on the GLUE benchmark, our DACBERT demonstrates notable improvement across various tasks, surpassing Crammed BERT by 3.13% in the RTE task and by 2.26% in the MRPC task. Furthermore, our method boosts the average GLUE score by 0.83%, underscoring its significant potential. The pretraining process can be efficiently executed on a single GPU within a 24-hour cycle, necessitating no supplementary computational resources or extending the pretraining duration compared with the Crammed BERT. Extensive studies further illuminate our approach's instrumental role in bolstering the interpretability of pretrained language models for natural language understanding tasks. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2308.15118 [pdf, other]

Large Language Models on the Chessboard: A Study on ChatGPT's Formal Language Comprehension and Complex Reasoning Skills

Authors: Mu-Tien Kuo, Chih-Chung Hsueh, Richard Tzong-Han Tsai

Abstract: While large language models have made strides in natural language processing, their proficiency in complex reasoning tasks requiring formal language comprehension, such as chess, remains less investigated. This paper probes the performance of ChatGPT, a sophisticated language model by OpenAI in tackling such complex reasoning tasks, using chess as a case study. Through robust metrics examining bot… ▽ More While large language models have made strides in natural language processing, their proficiency in complex reasoning tasks requiring formal language comprehension, such as chess, remains less investigated. This paper probes the performance of ChatGPT, a sophisticated language model by OpenAI in tackling such complex reasoning tasks, using chess as a case study. Through robust metrics examining both the legality and quality of moves, we assess ChatGPT's understanding of the chessboard, adherence to chess rules, and strategic decision-making abilities. Our evaluation identifies limitations within ChatGPT's attention mechanism that affect its formal language comprehension and uncovers the model's underdeveloped self-regulation abilities. Our study also reveals ChatGPT's propensity for a coherent strategy in its gameplay and a noticeable uptick in decision-making assertiveness when the model is presented with a greater volume of natural language or possesses a more lucid understanding of the state of the chessboard. These findings contribute to the growing exploration of language models' abilities beyond natural language processing, providing valuable information for future research towards models demonstrating human-like cognitive abilities. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2305.05644 [pdf, other]

Towards Building the Federated GPT: Federated Instruction Tuning

Authors: Jianyi Zhang, Saeed Vahidian, Martin Kuo, Chunyuan Li, Ruiyi Zhang, Tong Yu, Yufan Zhou, Guoyin Wang, Yiran Chen

Abstract: While "instruction-tuned" generative large language models (LLMs) have demonstrated an impressive ability to generalize to new tasks, the training phases heavily rely on large amounts of diverse and high-quality instruction data (such as ChatGPT and GPT-4). Unfortunately, acquiring high-quality data, especially when it comes to human-written data, can pose significant challenges both in terms of c… ▽ More While "instruction-tuned" generative large language models (LLMs) have demonstrated an impressive ability to generalize to new tasks, the training phases heavily rely on large amounts of diverse and high-quality instruction data (such as ChatGPT and GPT-4). Unfortunately, acquiring high-quality data, especially when it comes to human-written data, can pose significant challenges both in terms of cost and accessibility. Moreover, concerns related to privacy can further limit access to such data, making the process of obtaining it a complex and nuanced undertaking. Consequently, this hinders the generality of the tuned models and may restrict their effectiveness in certain contexts. To tackle this issue, our study introduces a new approach called Federated Instruction Tuning (FedIT), which leverages federated learning (FL) as the learning framework for the instruction tuning of LLMs. This marks the first exploration of FL-based instruction tuning for LLMs. This is especially important since text data is predominantly generated by end users. Therefore, it is imperative to design and adapt FL approaches to effectively leverage these users' diverse instructions stored on local devices, while preserving privacy and ensuring data security. In the current paper, by conducting widely used GPT-4 auto-evaluation, we demonstrate that by exploiting the heterogeneous and diverse sets of instructions on the client's end with the proposed framework FedIT, we improved the performance of LLMs compared to centralized training with only limited local instructions. Further, in this paper, we developed a Github repository named Shepherd. This repository offers a foundational framework for exploring federated fine-tuning of LLMs using heterogeneous instructions across diverse categories. △ Less

Submitted 29 January, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: Project page: https://github.com/JayZhang42/FederatedGPT-Shepherd

arXiv:2112.11700 [pdf, other]

Adaptive Contrast for Image Regression in Computer-Aided Disease Assessment

Authors: Weihang Dai, Xiaomeng Li, Wan Hang Keith Chiu, Michael D. Kuo, Kwang-Ting Cheng

Abstract: Image regression tasks for medical applications, such as bone mineral density (BMD) estimation and left-ventricular ejection fraction (LVEF) prediction, play an important role in computer-aided disease assessment. Most deep regression methods train the neural network with a single regression loss function like MSE or L1 loss. In this paper, we propose the first contrastive learning framework for d… ▽ More Image regression tasks for medical applications, such as bone mineral density (BMD) estimation and left-ventricular ejection fraction (LVEF) prediction, play an important role in computer-aided disease assessment. Most deep regression methods train the neural network with a single regression loss function like MSE or L1 loss. In this paper, we propose the first contrastive learning framework for deep image regression, namely AdaCon, which consists of a feature learning branch via a novel adaptive-margin contrastive loss and a regression prediction branch. Our method incorporates label distance relationships as part of the learned feature representations, which allows for better performance in downstream regression tasks. Moreover, it can be used as a plug-and-play module to improve performance of existing regression methods. We demonstrate the effectiveness of AdaCon on two medical image regression tasks, ie, bone mineral density estimation from X-ray images and left-ventricular ejection fraction prediction from echocardiogram videos. AdaCon leads to relative improvements of 3.3% and 5.9% in MAE over state-of-the-art BMD estimation and LVEF prediction methods, respectively. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: Accepted in IEEE Transactions on Medical Imaging

arXiv:2107.11537 [pdf]

doi 10.1109/TII.2020.3009133.

Secure Links: Secure-by-Design Communications in IEC 61499 Industrial Control Applications

Authors: Awais Tanveer, Roopak Sinha, Matthew M. Y. Kuo

Abstract: Increasing automation and external connectivity in industrial control systems (ICS) demand a greater emphasis on software-level communication security. In this article, we propose a secure-by-design development method for building ICS applications, where requirements from security standards like ISA/IEC 62443 are fulfilled by design-time abstractions called secure links. Proposed as an extension t… ▽ More Increasing automation and external connectivity in industrial control systems (ICS) demand a greater emphasis on software-level communication security. In this article, we propose a secure-by-design development method for building ICS applications, where requirements from security standards like ISA/IEC 62443 are fulfilled by design-time abstractions called secure links. Proposed as an extension to the IEC 61499 development standard, secure links incorporate both light-weight and traditional security mechanisms into applications with negligible effort. Applications containing secure links can be automatically compiled into fully IEC 61499-compliant software. Experimental results show secure links significantly reduce design and code complexity and improve application maintainability and requirements traceability. △ Less

Submitted 24 July, 2021; originally announced July 2021.

Comments: Journal paper, 11 pages, 10 figures, 3 tables

Journal ref: IEEE Transactions on Industrial Informatics 17(6)(2021), pp.3992-4002

arXiv:2009.07406 [pdf, other]

Tag and Correct: Question aware Open Information Extraction with Two-stage Decoding

Authors: Martin Kuo, Yaobo Liang, Lei Ji, Nan Duan, Linjun Shou, Ming Gong, Peng Chen

Abstract: Question Aware Open Information Extraction (Question aware Open IE) takes question and passage as inputs, outputting an answer tuple which contains a subject, a predicate, and one or more arguments. Each field of answer is a natural language word sequence and is extracted from the passage. The semi-structured answer has two advantages which are more readable and falsifiable compared to span answer… ▽ More Question Aware Open Information Extraction (Question aware Open IE) takes question and passage as inputs, outputting an answer tuple which contains a subject, a predicate, and one or more arguments. Each field of answer is a natural language word sequence and is extracted from the passage. The semi-structured answer has two advantages which are more readable and falsifiable compared to span answer. There are two approaches to solve this problem. One is an extractive method which extracts candidate answers from the passage with the Open IE model, and ranks them by matching with questions. It fully uses the passage information at the extraction step, but the extraction is independent to the question. The other one is the generative method which uses a sequence to sequence model to generate answers directly. It combines the question and passage as input at the same time, but it generates the answer from scratch, which does not use the facts that most of the answer words come from in the passage. To guide the generation by passage, we present a two-stage decoding model which contains a tagging decoder and a correction decoder. At the first stage, the tagging decoder will tag keywords from the passage. At the second stage, the correction decoder will generate answers based on tagged keywords. Our model could be trained end-to-end although it has two stages. Compared to previous generative models, we generate better answers by generating coarse to fine. We evaluate our model on WebAssertions (Yan et al., 2018) which is a Question aware Open IE dataset. Our model achieves a BLEU score of 59.32, which is better than previous generative methods. △ Less

Submitted 15 September, 2020; originally announced September 2020.

Comments: 11 pages, 1 figure, 4 tables

MSC Class: 68T50; 68T01

arXiv:2008.09394 [pdf, other]

A Variational Approach to Unsupervised Sentiment Analysis

Authors: Ziqian Zeng, Wenxuan Zhou, Xin Liu, Zizheng Lin, Yangqin Song, Michael David Kuo, Wan Hang Keith Chiu

Abstract: In this paper, we propose a variational approach to unsupervised sentiment analysis. Instead of using ground truth provided by domain experts, we use target-opinion word pairs as a supervision signal. For example, in a document snippet "the room is big," (room, big) is a target-opinion word pair. These word pairs can be extracted by using dependency parsers and simple rules. Our objective function… ▽ More In this paper, we propose a variational approach to unsupervised sentiment analysis. Instead of using ground truth provided by domain experts, we use target-opinion word pairs as a supervision signal. For example, in a document snippet "the room is big," (room, big) is a target-opinion word pair. These word pairs can be extracted by using dependency parsers and simple rules. Our objective function is to predict an opinion word given a target word while our ultimate goal is to learn a sentiment classifier. By introducing a latent variable, i.e., the sentiment polarity, to the objective function, we can inject the sentiment classifier to the objective function via the evidence lower bound. We can learn a sentiment classifier by optimizing the lower bound. We also impose sophisticated constraints on opinion words as regularization which encourages that if two documents have similar (dissimilar) opinion words, the sentiment classifiers should produce similar (different) probability distribution. We apply our method to sentiment analysis on customer reviews and clinical narratives. The experiment results show our method can outperform unsupervised baselines in sentiment analysis task on both domains, and our method obtains comparable results to the supervised method with hundreds of labels per aspect in customer reviews domain, and obtains comparable results to supervised methods in clinical narratives domain. △ Less

Submitted 21 August, 2020; originally announced August 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1904.05055

arXiv:1912.12418 [pdf]

Measuring group-separability in geometrical space for evaluation of pattern recognition and embedding algorithms

Authors: A. Acevedo, S. Ciucci, MJ. Kuo, C. Duran, CV. Cannistraci

Abstract: Evaluating data separation in a geometrical space is fundamental for pattern recognition. A plethora of dimensionality reduction (DR) algorithms have been developed in order to reveal the emergence of geometrical patterns in a low dimensional visible representation space, in which high-dimensional samples similarities are approximated by geometrical distances. However, statistical measures to eval… ▽ More Evaluating data separation in a geometrical space is fundamental for pattern recognition. A plethora of dimensionality reduction (DR) algorithms have been developed in order to reveal the emergence of geometrical patterns in a low dimensional visible representation space, in which high-dimensional samples similarities are approximated by geometrical distances. However, statistical measures to evaluate directly in the low dimensional geometrical space the sample group separability attaiend by these DR algorithms are missing. Certainly, these separability measures could be used both to compare algorithms performance and to tune algorithms parameters. Here, we propose three statistical measures (named as PSI-ROC, PSI-PR, and PSI-P) that have origin from the Projection Separability (PS) rationale introduced in this study, which is expressly designed to assess group separability of data samples in a geometrical space. Traditional cluster validity indices (CVIs) might be applied in this context but they show limitations because they are not specifically tailored for DR. Our PS measures are compared to six baseline cluster validity indices, using five non-linear datasets and six different DR algorithms. The results provide clear evidence that statistical-based measures based on PS rationale are more accurate than CVIs and can be adopted to control the tuning of parameter-dependent DR algorithms. △ Less

Submitted 28 December, 2019; originally announced December 2019.

arXiv:1906.10284 [pdf, other]

Appearance and Shape from Water Reflection

Authors: Ryo Kawahara, Meng-Yu Jennifer Kuo, Shohei Nobuhara, Ko Nishino

Abstract: This paper introduces single-image geometric and appearance reconstruction from water reflection photography, i.e., images capturing direct and water-reflected real-world scenes. Water reflection offers an additional viewpoint to the direct sight, collectively forming a stereo pair. The water-reflected scene, however, includes internally scattered and reflected environmental illumination in additi… ▽ More This paper introduces single-image geometric and appearance reconstruction from water reflection photography, i.e., images capturing direct and water-reflected real-world scenes. Water reflection offers an additional viewpoint to the direct sight, collectively forming a stereo pair. The water-reflected scene, however, includes internally scattered and reflected environmental illumination in addition to the scene radiance, which precludes direct stereo matching. We derive a principled iterative method that disentangles this scene radiometry and geometry for reconstructing 3D scene structure as well as its high-dynamic range appearance. In the presence of waves, we simultaneously recover the wave geometry as surface normal perturbations of the water surface. Most important, we show that the water reflection enables calibration of the camera. In other words, for the first time, we show that capturing a direct and water-reflected scene in a single exposure forms a self-calibrating HDR catadioptric stereo camera. We demonstrate our method on a number of images taken in the wild. The results demonstrate a new means for leveraging this accidental catadioptric camera. △ Less

Submitted 7 January, 2020; v1 submitted 24 June, 2019; originally announced June 2019.

Comments: WACV 2020

arXiv:0712.2587 [pdf, ps, other]

Maximum-Likelihood Priority-First Search Decodable Codes for Combined Channel Estimation and Error Protection

Authors: Chia-Lung Wu, Po-Ning Chen, Yunghsiang S. Han, Ming-Hsin Kuo

Abstract: The code that combines channel estimation and error protection has received general attention recently, and has been considered a promising methodology to compensate multi-path fading effect. It has been shown by simulations that such code design can considerably improve the system performance over the conventional design with separate channel estimation and error protection modules under the sa… ▽ More The code that combines channel estimation and error protection has received general attention recently, and has been considered a promising methodology to compensate multi-path fading effect. It has been shown by simulations that such code design can considerably improve the system performance over the conventional design with separate channel estimation and error protection modules under the same code rate. Nevertheless, the major obstacle that prevents from the practice of the codes is that the existing codes are mostly searched by computers, and hence exhibit no good structure for efficient decoding. Hence, the time-consuming exhaustive search becomes the only decoding choice, and the decoding complexity increases dramatically with the codeword length. In this paper, by optimizing the signal-tonoise ratio, we found a systematic construction for the codes for combined channel estimation and error protection, and confirmed its equivalence in performance to the computer-searched codes by simulations. Moreover, the structural codes that we construct by rules can now be maximum-likelihoodly decodable in terms of a newly derived recursive metric for use of the priority-first search decoding algorithm. Thus,the decoding complexity reduces significantly when compared with that of the exhaustive decoder. The extension code design for fast-fading channels is also presented. Simulations conclude that our constructed extension code is robust in performance even if the coherent period is shorter than the codeword length. △ Less

Submitted 17 December, 2007; originally announced December 2007.

Comments: 13 figures, 2 tables

Showing 1–12 of 12 results for author: Kuo, M