Search | arXiv e-print repository

Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education

Authors: Wei Hung Pan, Ming Jie Chok, Jonathan Leong Shan Wong, Yung Xin Shin, Yeong Shian Poon, Zhou Yang, Chun Yong Chong, David Lo, Mei Kuan Lim

Abstract: Educators are increasingly concerned about the usage of Large Language Models (LLMs) such as ChatGPT in programming education, particularly regarding the potential exploitation of imperfections in Artificial Intelligence Generated Content (AIGC) Detectors for academic misconduct. In this paper, we present an empirical study where the LLM is examined for its attempts to bypass detection by AIGC Det… ▽ More Educators are increasingly concerned about the usage of Large Language Models (LLMs) such as ChatGPT in programming education, particularly regarding the potential exploitation of imperfections in Artificial Intelligence Generated Content (AIGC) Detectors for academic misconduct. In this paper, we present an empirical study where the LLM is examined for its attempts to bypass detection by AIGC Detectors. This is achieved by generating code in response to a given question using different variants. We collected a dataset comprising 5,069 samples, with each sample consisting of a textual description of a coding problem and its corresponding human-written Python solution codes. These samples were obtained from various sources, including 80 from Quescol, 3,264 from Kaggle, and 1,725 from LeetCode. From the dataset, we created 13 sets of code problem variant prompts, which were used to instruct ChatGPT to generate the outputs. Subsequently, we assessed the performance of five AIGC detectors. Our results demonstrate that existing AIGC Detectors perform poorly in distinguishing between human-written code and AI-generated code. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 11 pages, paper accepted at 46th International Conference on Software Engineering, Software Engineering Education and Training Track (ICSE-SEET 2024)

arXiv:2305.17445 [pdf, other]

Synthesizing Speech Test Cases with Text-to-Speech? An Empirical Study on the False Alarms in Automated Speech Recognition Testing

Authors: Julia Kaiwen Lau, Kelvin Kai Wen Kong, Julian Hao Yong, Per Hoong Tan, Zhou Yang, Zi Qian Yong, Joshua Chern Wey Low, Chun Yong Chong, Mei Kuan Lim, David Lo

Abstract: Recent studies have proposed the use of Text-To-Speech (TTS) systems to automatically synthesise speech test cases on a scale and uncover a large number of failures in ASR systems. However, the failures uncovered by synthetic test cases may not reflect the actual performance of an ASR system when it transcribes human audio, which we refer to as false alarms. Given a failed test case synthesised fr… ▽ More Recent studies have proposed the use of Text-To-Speech (TTS) systems to automatically synthesise speech test cases on a scale and uncover a large number of failures in ASR systems. However, the failures uncovered by synthetic test cases may not reflect the actual performance of an ASR system when it transcribes human audio, which we refer to as false alarms. Given a failed test case synthesised from TTS systems, which consists of TTS-generated audio and the corresponding ground truth text, we feed the human audio stating the same text to an ASR system. If human audio can be correctly transcribed, an instance of a false alarm is detected. In this study, we investigate false alarm occurrences in five popular ASR systems using synthetic audio generated from four TTS systems and human audio obtained from two commonly used datasets. Our results show that the least number of false alarms is identified when testing Deepspeech, and the number of false alarms is the highest when testing Wav2vec2. On average, false alarm rates range from 21% to 34% in all five ASR systems. Among the TTS systems used, Google TTS produces the least number of false alarms (17%), and Espeak TTS produces the highest number of false alarms (32%) among the four TTS systems. Additionally, we build a false alarm estimator that flags potential false alarms, which achieves promising results: a precision of 98.3%, a recall of 96.4%, an accuracy of 98.5%, and an F1 score of 97.3%. Our study provides insight into the appropriate selection of TTS systems to generate high-quality speech to test ASR systems. Additionally, a false alarm estimator can be a way to minimise the impact of false alarms and help developers choose suitable test inputs when evaluating ASR systems. The source code used in this paper is publicly available on GitHub at https://github.com/julianyonghao/FAinASRtest. △ Less

Submitted 18 July, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: 13 pages, Accepted at ISSTA2023

arXiv:2303.06283 [pdf, other]

Closing the Loop for Software Remodularisation -- REARRANGE: An Effort Estimation Approach for Software Clustering-based Remodularisation

Authors: Alvin Jian Jia Tan, Chun Yong Chong, Aldeida Aleti

Abstract: Software remodularization through clustering is a common practice to improve internal software quality. However, the true benefit of software clustering is only realized if developers follow through with the recommended refactoring suggestions, which can be complex and time-consuming. Simply producing clustering results is not enough to realize the benefits of remodularization. For the recommended… ▽ More Software remodularization through clustering is a common practice to improve internal software quality. However, the true benefit of software clustering is only realized if developers follow through with the recommended refactoring suggestions, which can be complex and time-consuming. Simply producing clustering results is not enough to realize the benefits of remodularization. For the recommended refactoring operations to have an impact, developers must follow through with them. However, this is often a difficult task due to certain refactoring operations' complexity and time-consuming nature. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: Accepted for publication at ICSE23 Poster Track

arXiv:2303.04566 [pdf, other]

Robustness Evaluation in Hand Pose Estimation Models using Metamorphic Testing

Authors: Muxin Pu, Chun Yong Chong, Mei Kuan Lim

Abstract: Hand pose estimation (HPE) is a task that predicts and describes the hand poses from images or video frames. When HPE models estimate hand poses captured in a laboratory or under controlled environments, they normally deliver good performance. However, the real-world environment is complex, and various uncertainties may happen, which could degrade the performance of HPE models. For example, the ha… ▽ More Hand pose estimation (HPE) is a task that predicts and describes the hand poses from images or video frames. When HPE models estimate hand poses captured in a laboratory or under controlled environments, they normally deliver good performance. However, the real-world environment is complex, and various uncertainties may happen, which could degrade the performance of HPE models. For example, the hands could be occluded, the visibility of hands could be reduced by imperfect exposure rate, and the contour of hands prone to be blurred during fast hand movements. In this work, we adopt metamorphic testing to evaluate the robustness of HPE models and provide suggestions on the choice of HPE models for different applications. The robustness evaluation was conducted on four state-of-the-art models, namely MediaPipe hands, OpenPose, BodyHands, and NSRM hand. We found that on average more than 80\% of the hands could not be identified by BodyHands, and at least 50\% of hands could not be identified by MediaPipe hands when diagonal motion blur is introduced, while an average of more than 50\% of strongly underexposed hands could not be correctly estimated by NSRM hand. Similarly, applying occlusions on only four hand joints will also largely degrade the performance of these models. The experimental results show that occlusions, illumination variations, and motion blur are the main obstacles to the performance of existing HPE models. These findings may pave the way for researchers to improve the performance and robustness of hand pose estimation models and their applications. △ Less

Submitted 8 March, 2023; originally announced March 2023.

Comments: Accepted at 2023 8th International Workshop on Metamorphic Testing, 8 pages

arXiv:2302.05582 [pdf, other]

ASDF: A Differential Testing Framework for Automatic Speech Recognition Systems

Authors: Daniel Hao Xian Yuen, Andrew Yong Chen Pang, Zhou Yang, Chun Yong Chong, Mei Kuan Lim, David Lo

Abstract: Recent years have witnessed wider adoption of Automated Speech Recognition (ASR) techniques in various domains. Consequently, evaluating and enhancing the quality of ASR systems is of great importance. This paper proposes ASDF, an Automated Speech Recognition Differential Testing Framework for testing ASR systems. ASDF extends an existing ASR testing tool, the CrossASR++, which synthesizes test ca… ▽ More Recent years have witnessed wider adoption of Automated Speech Recognition (ASR) techniques in various domains. Consequently, evaluating and enhancing the quality of ASR systems is of great importance. This paper proposes ASDF, an Automated Speech Recognition Differential Testing Framework for testing ASR systems. ASDF extends an existing ASR testing tool, the CrossASR++, which synthesizes test cases from a text corpus. However, CrossASR++ fails to make use of the text corpus efficiently and provides limited information on how the failed test cases can improve ASR systems. To address these limitations, our tool incorporates two novel features: (1) a text transformation module to boost the number of generated test cases and uncover more errors in ASR systems and (2) a phonetic analysis module to identify on which phonemes the ASR system tend to produce errors. ASDF generates more high-quality test cases by applying various text transformation methods (e.g., change tense) to the texts in failed test cases. By doing so, ASDF can utilize a small text corpus to generate a large number of audio test cases, something which CrossASR++ is not capable of. In addition, ASDF implements more metrics to evaluate the performance of ASR systems from multiple perspectives. ASDF performs phonetic analysis on the identified failed test cases to identify the phonemes that ASR systems tend to transcribe incorrectly, providing useful information for developers to improve ASR systems. The demonstration video of our tool is made online at https://www.youtube.com/watch?v=DzVwfc3h9As. The implementation is available at https://github.com/danielyuenhx/asdf-differential-testing. △ Less

Submitted 10 February, 2023; originally announced February 2023.

Comments: Accpeted by ICST 2023 Tool Demo Track

arXiv:2301.05069 [pdf, ps, other]

Open Design Case Study -- A Crowdsourcing Effort to Curate Software Design Case Studies

Authors: Chun Yong Chong, Eunsuk Kang, Mary Shaw

Abstract: Case study-based learning has been successfully integrated into various courses, including software engineering education. In the context of software design courses, the use of case studies often entails sharing of real successful or failed software development. Using examples of real-world case studies allows educators to reinforce the applicability and usefulness of fundamental design concepts,… ▽ More Case study-based learning has been successfully integrated into various courses, including software engineering education. In the context of software design courses, the use of case studies often entails sharing of real successful or failed software development. Using examples of real-world case studies allows educators to reinforce the applicability and usefulness of fundamental design concepts, relate the importance of evaluating design trade-offs with respect to stakeholders' requirements, and highlight the importance of upfront design where students that lack industrial experience tend to overlook. However, the use of real-world case studies is not straightforward because 1.) there is a lack of open source repositories for real software design case studies and 2.) even if case studies are available, they are often reported without a standardized format, which may hinder the alignment between the case and the desired learning outcomes. To address the lack of software design case studies for educational purposes, we propose the idea of Open Design Case Study, a repository to crowdsource, curate, and recruit other educators to contribute case studies for teaching software design courses. The platform will also allow educators and students to share, brainstorm, and discuss design solutions based on case studies shared publicly on the repository. △ Less

Submitted 12 January, 2023; originally announced January 2023.

Comments: 6 pages, accepted at ICSE-SEET2023

arXiv:2207.14535 [pdf, other]

doi 10.1007/978-3-031-37660-3_43

SERCNN: Stacked Embedding Recurrent Convolutional Neural Network in Detecting Depression on Twitter

Authors: Heng Ee Tay, Mei Kuan Lim, Chun Yong Chong

Abstract: Conventional approaches to identify depression are not scalable, and the public has limited awareness of mental health, especially in develo** countries. As evident by recent studies, social media has the potential to complement mental health screening on a greater scale. The vast amount of first-person narrative posts in chronological order can provide insights into one's thoughts, feelings, be… ▽ More Conventional approaches to identify depression are not scalable, and the public has limited awareness of mental health, especially in develo** countries. As evident by recent studies, social media has the potential to complement mental health screening on a greater scale. The vast amount of first-person narrative posts in chronological order can provide insights into one's thoughts, feelings, behavior, or mood for some time, enabling a better understanding of depression symptoms reflected in the online space. In this paper, we propose SERCNN, which improves the user representation by (1) stacking two pretrained embeddings from different domains and (2) reintroducing the embedding context to the MLP classifier. Our SERCNN shows great performance over state-of-the-art and other baselines, achieving 93.7% accuracy in a 5-fold cross-validation setting. Since not all users share the same level of online activity, we introduced the concept of a fixed observation window that quantifies the observation period in a predefined number of posts. With as minimal as 10 posts per user, SERCNN performed exceptionally well with an 87% accuracy, which is on par with the BERT model, while having 98% less in the number of parameters. Our findings open up a promising direction for detecting depression on social media with a smaller number of posts for inference, towards creating solutions for a cost-effective and timely intervention. We hope that our work can bring this research area closer to real-world adoption in existing clinical practice. △ Less

Submitted 5 August, 2022; v1 submitted 29 July, 2022; originally announced July 2022.

Comments: This paper has been accepted at the AIHA 2022 workshop of the ICPR 2022 conference

arXiv:2204.08612 [pdf, other]

Metamorphic Testing-based Adversarial Attack to Fool Deepfake Detectors

Authors: Nyee Thoang Lim, Meng Yi Kuan, Muxin Pu, Mei Kuan Lim, Chun Yong Chong

Abstract: Deepfakes utilise Artificial Intelligence (AI) techniques to create synthetic media where the likeness of one person is replaced with another. There are growing concerns that deepfakes can be maliciously used to create misleading and harmful digital contents. As deepfakes become more common, there is a dire need for deepfake detection technology to help spot deepfake media. Present deepfake detect… ▽ More Deepfakes utilise Artificial Intelligence (AI) techniques to create synthetic media where the likeness of one person is replaced with another. There are growing concerns that deepfakes can be maliciously used to create misleading and harmful digital contents. As deepfakes become more common, there is a dire need for deepfake detection technology to help spot deepfake media. Present deepfake detection models are able to achieve outstanding accuracy (>90%). However, most of them are limited to within-dataset scenario, where the same dataset is used for training and testing. Most models do not generalise well enough in cross-dataset scenario, where models are tested on unseen datasets from another source. Furthermore, state-of-the-art deepfake detection models rely on neural network-based classification models that are known to be vulnerable to adversarial attacks. Motivated by the need for a robust deepfake detection model, this study adapts metamorphic testing (MT) principles to help identify potential factors that could influence the robustness of the examined model, while overcoming the test oracle problem in this domain. Metamorphic testing is specifically chosen as the testing technique as it fits our demand to address learning-based system testing with probabilistic outcomes from largely black-box components, based on potentially large input domains. We performed our evaluations on MesoInception-4 and TwoStreamNet models, which are the state-of-the-art deepfake detection models. This study identified makeup application as an adversarial attack that could fool deepfake detectors. Our experimental results demonstrate that both the MesoInception-4 and TwoStreamNet models degrade in their performance by up to 30\% when the input data is perturbed with makeup. △ Less

Submitted 31 May, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

Comments: paper accepted at 26TH International Conference on Pattern Recognition (ICPR2022)

arXiv:2203.06825 [pdf, other]

Fairness Evaluation in Deepfake Detection Models using Metamorphic Testing

Authors: Muxin Pu, Meng Yi Kuan, Nyee Thoang Lim, Chun Yong Chong, Mei Kuan Lim

Abstract: Fairness of deepfake detectors in the presence of anomalies are not well investigated, especially if those anomalies are more prominent in either male or female subjects. The primary motivation for this work is to evaluate how deepfake detection model behaves under such anomalies. However, due to the black-box nature of deep learning (DL) and artificial intelligence (AI) systems, it is hard to pre… ▽ More Fairness of deepfake detectors in the presence of anomalies are not well investigated, especially if those anomalies are more prominent in either male or female subjects. The primary motivation for this work is to evaluate how deepfake detection model behaves under such anomalies. However, due to the black-box nature of deep learning (DL) and artificial intelligence (AI) systems, it is hard to predict the performance of a model when the input data is modified. Crucially, if this defect is not addressed properly, it will adversely affect the fairness of the model and result in discrimination of certain sub-population unintentionally. Therefore, the objective of this work is to adopt metamorphic testing to examine the reliability of the selected deepfake detection model, and how the transformation of input variation places influence on the output. We have chosen MesoInception-4, a state-of-the-art deepfake detection model, as the target model and makeup as the anomalies. Makeups are applied through utilizing the Dlib library to obtain the 68 facial landmarks prior to filling in the RGB values. Metamorphic relations are derived based on the notion that realistic perturbations of the input images, such as makeup, involving eyeliners, eyeshadows, blushes, and lipsticks (which are common cosmetic appearance) applied to male and female images, should not alter the output of the model by a huge margin. Furthermore, we narrow down the scope to focus on revealing potential gender biases in DL and AI systems. Specifically, we are interested to examine whether MesoInception-4 model produces unfair decisions, which should be considered as a consequence of robustness issues. The findings from our work have the potential to pave the way for new research directions in the quality assurance and fairness in DL and AI systems. △ Less

Submitted 13 March, 2022; originally announced March 2022.

Comments: 8 pages, accepted at 7th International Workshop on Metamorphic Testing (MET22)

arXiv:2107.01766 [pdf, other]

E-SC4R: Explaining Software Clustering for Remodularisation

Authors: Alvin Jian Jia Tan, Chun Yong Chong, Aldeida Aleti

Abstract: Maintenance of existing software requires a large amount of time for comprehending the source code. The architecture of a software, however, may not be clear to maintainers if up to date documentations are not available. Software clustering is often used as a remodularisation and architecture recovery technique to help recover a semantic representation of the software design. Due to the diverse do… ▽ More Maintenance of existing software requires a large amount of time for comprehending the source code. The architecture of a software, however, may not be clear to maintainers if up to date documentations are not available. Software clustering is often used as a remodularisation and architecture recovery technique to help recover a semantic representation of the software design. Due to the diverse domains, structure, and behaviour of software systems, the suitability of different clustering algorithms for different software systems are not investigated thoroughly. Research that introduce new clustering techniques usually validate their approaches on a specific domain, which might limit its generalisability. If the chosen test subjects could only represent a narrow perspective of the whole picture, researchers might risk not being able to address the external validity of their findings. This work aims to fill this gap by introducing a new approach, Explaining Software Clustering for Remodularisation, to evaluate the effectiveness of different software clustering approaches. This work focuses on hierarchical clustering and Bunch clustering algorithms and provides information about their suitability according to the features of the software, which as a consequence, enables the selection of the most optimum algorithm and configuration from our existing pool of choices for a particular software system. The proposed framework is tested on 30 open source software systems with varying sizes and domains, and demonstrates that it can characterise both the strengths and weaknesses of the analysed software clustering algorithms using software features extracted from the code. The proposed approach also provides a better understanding of the algorithms behaviour through the application of dimensionality reduction techniques. △ Less

Submitted 2 October, 2021; v1 submitted 4 July, 2021; originally announced July 2021.

Comments: 31 pages

arXiv:2106.07513 [pdf, other]

CodeLabeller: A Web-based Code Annotation Tool for Java Design Patterns and Summaries

Authors: Najam Nazar, Norman Chen, Chun Yong Chong

Abstract: While constructing supervised learning models, we require labelled examples to build a corpus and train a machine learning model. However, most studies have built the labelled dataset manually, which in many occasions is a daunting task. To mitigate this problem, we have built an online tool called CodeLabeller. CodeLabeller is a web-based tool that aims to provide an efficient approach to handlin… ▽ More While constructing supervised learning models, we require labelled examples to build a corpus and train a machine learning model. However, most studies have built the labelled dataset manually, which in many occasions is a daunting task. To mitigate this problem, we have built an online tool called CodeLabeller. CodeLabeller is a web-based tool that aims to provide an efficient approach to handling the process of labelling source code files for supervised learning methods at scale by improving the data collection process throughout. CodeLabeller is tested by constructing a corpus of over a thousand source files obtained from a large collection of open source Java projects and labelling each Java source file with their respective design patterns and summaries. Twenty five experts in the field of software engineering participated in a usability evaluation of the tool using the standard User Experience Questionnaire online survey. The survey results demonstrate that the tool achieves the Good standard on hedonic and pragmatic quality standards, is easy to use and meets the needs of the annotating the corpus for supervised classifiers. Apart from assisting researchers in crowdsourcing a labelled dataset, the tool has practical applicability in software engineering education and assists in building expert ratings for software artefacts. △ Less

Submitted 13 March, 2023; v1 submitted 14 June, 2021; originally announced June 2021.

Comments: 15 pages, 5 Figures, 6 Tables

arXiv:2103.02870 [pdf, other]

doi 10.1109/MET52542.2021.00008

Robustness Evaluation of Stacked Generative Adversarial Networks using Metamorphic Testing

Authors: Hye** Park, Taaha Waseem, Wen Qi Teo, Ying Hwei Low, Mei Kuan Lim, Chun Yong Chong

Abstract: Synthesising photo-realistic images from natural language is one of the challenging problems in computer vision. Over the past decade, a number of approaches have been proposed, of which the improved Stacked Generative Adversarial Network (StackGAN-v2) has proven capable of generating high resolution images that reflect the details specified in the input text descriptions. In this paper, we aim to… ▽ More Synthesising photo-realistic images from natural language is one of the challenging problems in computer vision. Over the past decade, a number of approaches have been proposed, of which the improved Stacked Generative Adversarial Network (StackGAN-v2) has proven capable of generating high resolution images that reflect the details specified in the input text descriptions. In this paper, we aim to assess the robustness and fault-tolerance capability of the StackGAN-v2 model by introducing variations in the training data. However, due to the working principle of Generative Adversarial Network (GAN), it is difficult to predict the output of the model when the training data are modified. Hence, in this work, we adopt Metamorphic Testing technique to evaluate the robustness of the model with a variety of unexpected training dataset. As such, we first implement StackGAN-v2 algorithm and test the pre-trained model provided by the original authors to establish a ground truth for our experiments. We then identify a metamorphic relation, from which test cases are generated. Further, metamorphic relations were derived successively based on the observations of prior test results. Finally, we synthesise the results from our experiment of all the metamorphic relations and found that StackGAN-v2 algorithm is susceptible to input images with obtrusive objects, even if it overlaps with the main object minimally, which was not reported by the authors and users of StackGAN-v2 model. The proposed metamorphic relations can be applied to other text-to-image synthesis models to not only verify the robustness but also to help researchers understand and interpret the results made by the machine learning models. △ Less

Submitted 2 October, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

Comments: 8 pages, accepted at the 6th International Workshop on Metamorphic Testing (MET'21)

arXiv:2101.04837 [pdf, other]

Assessing the Students' Understanding and their Mistakes in Code Review Checklists -- An Experience Report of 1,791 Code Review Checklist Questions from 394 Students

Authors: Chun Yong Chong, Patanamon Thongtanunam, Chakkrit Tantithamthavorn

Abstract: Code review is a widely-used practice in software development companies to identify defects. Hence, code review has been included in many software engineering curricula at universities worldwide. However, teaching code review is still a challenging task because the code review effectiveness depends on the code reading and analytical skills of a reviewer. While several studies have investigated the… ▽ More Code review is a widely-used practice in software development companies to identify defects. Hence, code review has been included in many software engineering curricula at universities worldwide. However, teaching code review is still a challenging task because the code review effectiveness depends on the code reading and analytical skills of a reviewer. While several studies have investigated the code reading techniques that students should use to find defects during code review, little has focused on a learning activity that involves analytical skills. Indeed, develo** a code review checklist should stimulate students to develop their analytical skills to anticipate potential issues (i.e., software defects). Yet, it is unclear whether students can anticipate potential issues given their limited experience in software development (programming, testing, etc.). We perform a qualitative analysis to investigate whether students are capable of creating code review checklists, and if the checklists can be used to guide reviewers to find defects. In addition, we identify common mistakes that students make when develo** a code review checklist. Our results show that while there are some misconceptions among students about the purpose of code review, students are able to anticipate potential defects and create a relatively good code review checklist. Hence, our results lead us to conclude that develo** a code review checklist can be a part of the learning activities for code review in order to scaffold students' skills. △ Less

Submitted 12 January, 2021; originally announced January 2021.

Comments: 10 pages, accepted at the International Conference on Software Engineering: Joint Track on Software Engineering Education and Training Track (ICSE'21-JSEET)

arXiv:1304.3427 [pdf]

Metaprobability and Dempster-Shafer in Evidential Reasoning

Authors: Robert Fung, Chee Yee Chong

Abstract: Evidential reasoning in expert systems has often used ad-hoc uncertainty calculi. Although it is generally accepted that probability theory provides a firm theoretical foundation, researchers have found some problems with its use as a workable uncertainty calculus. Among these problems are representation of ignorance, consistency of probabilistic judgements, and adjustment of a priori judgements w… ▽ More Evidential reasoning in expert systems has often used ad-hoc uncertainty calculi. Although it is generally accepted that probability theory provides a firm theoretical foundation, researchers have found some problems with its use as a workable uncertainty calculus. Among these problems are representation of ignorance, consistency of probabilistic judgements, and adjustment of a priori judgements with experience. The application of metaprobability theory to evidential reasoning is a new approach to solving these problems. Metaprobability theory can be viewed as a way to provide soft or hard constraints on beliefs in much the same manner as the Dempster-Shafer theory provides constraints on probability masses on subsets of the state space. Thus, we use the Dempster-Shafer theory, an alternative theory of evidential reasoning to illuminate metaprobability theory as a theory of evidential reasoning. The goal of this paper is to compare how metaprobability theory and Dempster-Shafer theory handle the adjustment of beliefs with evidence with respect to a particular thought experiment. Sections 2 and 3 give brief descriptions of the metaprobability and Dempster-Shafer theories. Metaprobability theory deals with higher order probabilities applied to evidential reasoning. Dempster-Shafer theory is a generalization of probability theory which has evolved from a theory of upper and lower probabilities. Section 4 describes a thought experiment and the metaprobability and DempsterShafer analysis of the experiment. The thought experiment focuses on forming beliefs about a population with 6 types of members {1, 2, 3, 4, 5, 6}. A type is uniquely defined by the values of three features: A, B, C. That is, if the three features of one member of the population were known then its type could be ascertained. Each of the three features has two possible values, (e.g. A can be either "a0" or "al"). Beliefs are formed from evidence accrued from two sensors: sensor A, and sensor B. Each sensor senses the corresponding defining feature. Sensor A reports that half of its observations are "a0" and half the observations are 'al'. Sensor B reports that half of its observations are ``b0,' and half are "bl". Based on these two pieces of evidence, what should be the beliefs on the distribution of types in the population? Note that the third feature is not observed by any sensor. △ Less

Submitted 27 March, 2013; originally announced April 2013.

Comments: Appears in Proceedings of the First Conference on Uncertainty in Artificial Intelligence (UAI1985)

Report number: UAI-P-1985-PG-76-83

Showing 1–14 of 14 results for author: Chong, C Y