-
UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization
Authors:
Md Nayem Uddin,
Amir Saeidi,
Divij Handa,
Agastya Seth,
Tran Cao Son,
Eduardo Blanco,
Steven R. Corman,
Chitta Baral
Abstract:
This paper introduces UnSeenTimeQA, a novel time-sensitive question-answering (TSQA) benchmark that diverges from traditional TSQA benchmarks by avoiding factual and web-searchable queries. We present a series of time-sensitive event scenarios decoupled from real-world factual information. It requires large language models (LLMs) to engage in genuine temporal reasoning, disassociating from the kno…
▽ More
This paper introduces UnSeenTimeQA, a novel time-sensitive question-answering (TSQA) benchmark that diverges from traditional TSQA benchmarks by avoiding factual and web-searchable queries. We present a series of time-sensitive event scenarios decoupled from real-world factual information. It requires large language models (LLMs) to engage in genuine temporal reasoning, disassociating from the knowledge acquired during the pre-training phase. Our evaluation of six open-source LLMs (ranging from 2B to 70B in size) and three closed-source LLMs reveal that the questions from the UnSeenTimeQA present substantial challenges. This indicates the models' difficulties in handling complex temporal reasoning scenarios. Additionally, we present several analyses shedding light on the models' performance in answering time-sensitive questions.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Chaos with Keywords: Exposing Large Language Models Sycophancy to Misleading Keywords and Evaluating Defense Strategies
Authors:
Aswin RRV,
Nemika Tyagi,
Md Nayem Uddin,
Neeraj Varshney,
Chitta Baral
Abstract:
This study explores the sycophantic tendencies of Large Language Models (LLMs), where these models tend to provide answers that match what users want to hear, even if they are not entirely correct. The motivation behind this exploration stems from the common behavior observed in individuals searching the internet for facts with partial or misleading knowledge. Similar to using web search engines,…
▽ More
This study explores the sycophantic tendencies of Large Language Models (LLMs), where these models tend to provide answers that match what users want to hear, even if they are not entirely correct. The motivation behind this exploration stems from the common behavior observed in individuals searching the internet for facts with partial or misleading knowledge. Similar to using web search engines, users may recall fragments of misleading keywords and submit them to an LLM, ho** for a comprehensive response. Our empirical analysis of several LLMs shows the potential danger of these models amplifying misinformation when presented with misleading keywords. Additionally, we thoroughly assess four existing hallucination mitigation strategies to reduce LLMs sycophantic behavior. Our experiments demonstrate the effectiveness of these strategies for generating factually correct statements. Furthermore, our analyses delve into knowledge-probing experiments on factual keywords and different categories of sycophancy mitigation.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Asking and Answering Questions to Extract Event-Argument Structures
Authors:
Md Nayem Uddin,
Enfa Rose George,
Eduardo Blanco,
Steven Corman
Abstract:
This paper presents a question-answering approach to extract document-level event-argument structures. We automatically ask and answer questions for each argument type an event may have. Questions are generated using manually defined templates and generative transformers. Template-based questions are generated using predefined role-specific wh-words and event triggers from the context document. Tr…
▽ More
This paper presents a question-answering approach to extract document-level event-argument structures. We automatically ask and answer questions for each argument type an event may have. Questions are generated using manually defined templates and generative transformers. Template-based questions are generated using predefined role-specific wh-words and event triggers from the context document. Transformer-based questions are generated using large language models trained to formulate questions based on a passage and the expected answer. Additionally, we develop novel data augmentation strategies specialized in inter-sentential event-argument relations. We use a simple span-swap** technique, coreference resolution, and large language models to augment the training instances. Our approach enables transfer learning without any corpora-specific modifications and yields competitive results with the RAMS dataset. It outperforms previous work, and it is especially beneficial to extract arguments that appear in different sentences than the event trigger. We also present detailed quantitative and qualitative analyses shedding light on the most common errors made by our best model.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results
Authors:
Zheng Chen,
Zongwei Wu,
Eduard Zamfir,
Kai Zhang,
Yulun Zhang,
Radu Timofte,
Xiaokang Yang,
Hongyuan Yu,
Cheng Wan,
Yuxin Hong,
Zhijuan Huang,
Yajun Zou,
Yuan Huang,
Jiamin Lin,
Bingnan Han,
Xianyu Guan,
Yongsheng Yu,
Daoan Zhang,
Xuanwu Yin,
Kunlong Zuo,
**hua Hao,
Kai Zhao,
Kun Yuan,
Ming Sun,
Chao Zhou
, et al. (63 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i…
▽ More
This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge is to obtain designs/solutions with the most advanced SR performance, with no constraints on computational resources (e.g., model size and FLOPs) or training data. The track of this challenge assesses performance with the PSNR metric on the DIV2K testing dataset. The competition attracted 199 registrants, with 20 teams submitting valid entries. This collective endeavour not only pushes the boundaries of performance in single-image SR but also offers a comprehensive overview of current trends in this field.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Generating Uncontextualized and Contextualized Questions for Document-Level Event Argument Extraction
Authors:
Md Nayem Uddin,
Enfa Rose George,
Eduardo Blanco,
Steven Corman
Abstract:
This paper presents multiple question generation strategies for document-level event argument extraction. These strategies do not require human involvement and result in uncontextualized questions as well as contextualized questions grounded on the event and document of interest. Experimental results show that combining uncontextualized and contextualized questions is beneficial, especially when e…
▽ More
This paper presents multiple question generation strategies for document-level event argument extraction. These strategies do not require human involvement and result in uncontextualized questions as well as contextualized questions grounded on the event and document of interest. Experimental results show that combining uncontextualized and contextualized questions is beneficial, especially when event triggers and arguments appear in different sentences. Our approach does not have corpus-specific components, in particular, the question generation strategies transfer across corpora. We also present a qualitative analysis of the most common errors made by our best model.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
Securing Transactions: A Hybrid Dependable Ensemble Machine Learning Model using IHT-LR and Grid Search
Authors:
Md. Alamin Talukder,
Rakib Hossen,
Md Ashraf Uddin,
Mohammed Nasir Uddin,
Uzzal Kumar Acharjee
Abstract:
Financial institutions and businesses face an ongoing challenge from fraudulent transactions, prompting the need for effective detection methods. Detecting credit card fraud is crucial for identifying and preventing unauthorized transactions.Timely detection of fraud enables investigators to take swift actions to mitigate further losses. However, the investigation process is often time-consuming,…
▽ More
Financial institutions and businesses face an ongoing challenge from fraudulent transactions, prompting the need for effective detection methods. Detecting credit card fraud is crucial for identifying and preventing unauthorized transactions.Timely detection of fraud enables investigators to take swift actions to mitigate further losses. However, the investigation process is often time-consuming, limiting the number of alerts that can be thoroughly examined each day. Therefore, the primary objective of a fraud detection model is to provide accurate alerts while minimizing false alarms and missed fraud cases. In this paper, we introduce a state-of-the-art hybrid ensemble (ENS) dependable Machine learning (ML) model that intelligently combines multiple algorithms with proper weighted optimization using Grid search, including Decision Tree (DT), Random Forest (RF), K-Nearest Neighbor (KNN), and Multilayer Perceptron (MLP), to enhance fraud identification. To address the data imbalance issue, we employ the Instant Hardness Threshold (IHT) technique in conjunction with Logistic Regression (LR), surpassing conventional approaches. Our experiments are conducted on a publicly available credit card dataset comprising 284,807 transactions. The proposed model achieves impressive accuracy rates of 99.66%, 99.73%, 98.56%, and 99.79%, and a perfect 100% for the DT, RF, KNN, MLP and ENS models, respectively. The hybrid ensemble model outperforms existing works, establishing a new benchmark for detecting fraudulent transactions in high-frequency scenarios. The results highlight the effectiveness and reliability of our approach, demonstrating superior performance metrics and showcasing its exceptional potential for real-world fraud detection applications.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Interpreting Indirect Answers to Yes-No Questions in Multiple Languages
Authors:
Zijie Wang,
Md Mosharaf Hossain,
Shivam Mathur,
Terry Cruz Melo,
Kadir Bulut Ozler,
Keun Hee Park,
Jacob Quintero,
MohammadHossein Rezaei,
Shreya Nupur Shakya,
Md Nayem Uddin,
Eduardo Blanco
Abstract:
Yes-no questions expect a yes or no for an answer, but people often skip polar keywords. Instead, they answer with long explanations that must be interpreted. In this paper, we focus on this challenging problem and release new benchmarks in eight languages. We present a distant supervision approach to collect training data. We also demonstrate that direct answers (i.e., with polar keywords) are us…
▽ More
Yes-no questions expect a yes or no for an answer, but people often skip polar keywords. Instead, they answer with long explanations that must be interpreted. In this paper, we focus on this challenging problem and release new benchmarks in eight languages. We present a distant supervision approach to collect training data. We also demonstrate that direct answers (i.e., with polar keywords) are useful to train models to interpret indirect answers (i.e., without polar keywords). Experimental results demonstrate that monolingual fine-tuning is beneficial if training data can be obtained via distant supervision for the language of interest (5 languages). Additionally, we show that cross-lingual fine-tuning is always beneficial (8 languages).
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
The Second Monocular Depth Estimation Challenge
Authors:
Jaime Spencer,
C. Stella Qian,
Michaela Trescakova,
Chris Russell,
Simon Hadfield,
Erich W. Graf,
Wendy J. Adams,
Andrew J. Schofield,
James Elder,
Richard Bowden,
Ali Anwar,
Hao Chen,
Xiaozhi Chen,
Kai Cheng,
Yuchao Dai,
Huynh Thai Hoa,
Sadat Hossain,
Jianmian Huang,
Mohan **g,
Bo Li,
Chao Li,
Baojun Li,
Zhiwen Liu,
Stefano Mattoccia,
Siegfried Mercelis
, et al. (18 additional authors not shown)
Abstract:
This paper discusses the results for the second edition of the Monocular Depth Estimation Challenge (MDEC). This edition was open to methods using any form of supervision, including fully-supervised, self-supervised, multi-task or proxy depth. The challenge was based around the SYNS-Patches dataset, which features a wide diversity of environments with high-quality dense ground-truth. This includes…
▽ More
This paper discusses the results for the second edition of the Monocular Depth Estimation Challenge (MDEC). This edition was open to methods using any form of supervision, including fully-supervised, self-supervised, multi-task or proxy depth. The challenge was based around the SYNS-Patches dataset, which features a wide diversity of environments with high-quality dense ground-truth. This includes complex natural environments, e.g. forests or fields, which are greatly underrepresented in current benchmarks.
The challenge received eight unique submissions that outperformed the provided SotA baseline on any of the pointcloud- or image-based metrics. The top supervised submission improved relative F-Score by 27.62%, while the top self-supervised improved it by 16.61%. Supervised submissions generally leveraged large collections of datasets to improve data diversity. Self-supervised submissions instead updated the network architecture and pretrained backbones. These results represent a significant progress in the field, while highlighting avenues for future research, such as reducing interpolation artifacts at depth boundaries, improving self-supervised indoor performance and overall natural image accuracy.
△ Less
Submitted 26 April, 2023; v1 submitted 14 April, 2023;
originally announced April 2023.
-
Angular upsampling in diffusion MRI using contextual HemiHex sub-sampling in q-space
Authors:
Abrar Faiyaz,
Md Nasir Uddin,
Giovanni Schifitto
Abstract:
Artificial Intelligence (Deep Learning(DL)/ Machine Learning(ML)) techniques are widely being used to address and overcome all kinds of ill-posed problems in medical imaging which was or in fact is seemingly impossible. Reducing gradient directions but harnessing high angular resolution(HAR) diffusion data in MR that retains clinical features is an important and challenging problem in the field. W…
▽ More
Artificial Intelligence (Deep Learning(DL)/ Machine Learning(ML)) techniques are widely being used to address and overcome all kinds of ill-posed problems in medical imaging which was or in fact is seemingly impossible. Reducing gradient directions but harnessing high angular resolution(HAR) diffusion data in MR that retains clinical features is an important and challenging problem in the field. While the DL/ML approaches are promising, it is important to incorporate relevant context for the data to ensure that maximum prior information is provided for the AI model to infer the posterior. In this paper, we introduce HemiHex (HH) subsampling to suggestively address training data sampling on q-space geometry, followed by a nearest neighbor regression training on the HH-samples to finally upsample the dMRI data. Earlier studies has tried to use regression for up-sampling dMRI data but yields performance issues as it fails to provide structured geometrical measures for inference. Our proposed approach is a geometrically optimized regression technique which infers the unknown q-space thus addressing the limitations in the earlier studies.
△ Less
Submitted 31 October, 2022;
originally announced November 2022.
-
Single-Shell NODDI Using Dictionary Learner Estimated Isotropic Volume Fraction
Authors:
Abrar Faiyaz,
Marvin Doyley,
Giovanni Schifitto,
Jianhui Zhong,
Md Nasir Uddin
Abstract:
Neurite orientation dispersion and density imaging (NODDI) enables the assessment of intracellular, extracellular and free water signals from multi-shell diffusion MRI data. It is an insightful approach to characterize brain tissue microstructure. Single-shell reconstruction for NODDI parameters has been discouraged in previous studies caused by failure when fitting, especially for the neurite den…
▽ More
Neurite orientation dispersion and density imaging (NODDI) enables the assessment of intracellular, extracellular and free water signals from multi-shell diffusion MRI data. It is an insightful approach to characterize brain tissue microstructure. Single-shell reconstruction for NODDI parameters has been discouraged in previous studies caused by failure when fitting, especially for the neurite density index (NDI). Here, we investigated the possibility of creating robust NODDI parameter maps with single-shell data, using the isotropic volume fraction (fISO) as prior. Prior estimation was made independent of the NODDI model constraint using a dictionary learning approach. First, we used a stochastic sparse dictionary-based network (DictNet) in predicting fISO which is trained with data obtained from in vivo and simulated diffusion MRI data. In single-shell cases, the mean diffusivity (MD) and raw T2 signal with no diffusion weighting (S0) was incorporated in the dictionary for the fISO estimation. Then, the NODDI framework was used with the known fISO to estimate the NDI and orientation dispersion index (ODI). The fISO estimated by our model was compared with other fISO estimators in the simulation. Further, using both synthetic data simulation and human data collected on a 3T scanner, we compared the performance of our dictionary-based learning prior NODDI (DLpN) with the original NODDI for both single-shell and multi-shell data. Our results suggest that DLpN derived NDI and ODI parameters for single-shell protocols are comparable with original multi-shell NODDI, and protocol with b=2000 s/mm2 performs the best (error ~5% in white and grey matter). This may allow NODDI evaluation of studies on single-shell data by multi-shell scanning of two subjects for DictNet fISO training.
△ Less
Submitted 4 April, 2021; v1 submitted 2 February, 2021;
originally announced February 2021.
-
GSM-GPRS Based Smart Street Light
Authors:
Imran Kabir,
Shihab Uddin Ahamad,
Mohammad Naim Uddin,
Shah Mohazzem Hossain,
Faija Farjana,
Partha Protim Datta,
Md. Raduanul Alam Riad,
Mohammed Hossam-E-Haider
Abstract:
Street lighting system has always been the traditional manual system of illuminating the streets in Bangladesh, where a dedicated person is posted only to control the street lights of a zone, who roams around the zonal area to switch on and switch off the lights two times a day, which brings about the exhibition of bright lights in street even after sunrise and in some cases maybe the whole day. T…
▽ More
Street lighting system has always been the traditional manual system of illuminating the streets in Bangladesh, where a dedicated person is posted only to control the street lights of a zone, who roams around the zonal area to switch on and switch off the lights two times a day, which brings about the exhibition of bright lights in street even after sunrise and in some cases maybe the whole day. This results in insertion to the budget. In addition to this, faulty lights may not come to the heed of the concerned authority for a long time which leads to the technical downside. This paper demonstrates a process of controlling the street lights in country like Bangladesh employing SIM900 GSM-GPRS Shield which comes up with the provision of manual control, semi-automated control as well as full-automated control.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Comparative Mathematical Study of Blood Flow Through Stenotic and Aneurysmatic Artery with the Presence and Absence of Blood clots
Authors:
Mohammed Nasir Uddin,
M. Monir Uddin,
Md. Monjarul Alam
Abstract:
Numerical predictions of blood flow and hemodynamic properties through a stenotic and aneurysmal rigid artery are studied in the presence of blood clot at constricted area. Finite element method has been used to solve the steady partial differential equations of continuity, momentum, Oldroyd-B and bioheat transport in two dimensional cartesian coordinates system.The present investigation carries t…
▽ More
Numerical predictions of blood flow and hemodynamic properties through a stenotic and aneurysmal rigid artery are studied in the presence of blood clot at constricted area. Finite element method has been used to solve the steady partial differential equations of continuity, momentum, Oldroyd-B and bioheat transport in two dimensional cartesian coordinates system.The present investigation carries the potential to compute blood velocity, pressure and drag coefficient with major significance at the throat of stenosis and aneurysm. The models are also employed to study of simulation, influence of blood clot and hemodynamical characteristics for all modifications. The back flow and recirculation zones are found at stenotic and aneurysmal region for the model. The quantitative analysis is completed by numerical calculation having physiological significance of hemodynamical factors of blood flow depends on the dimensionless parameters which show the validity of present model.
△ Less
Submitted 12 May, 2020;
originally announced May 2020.
-
As You Are, So Shall You Move Your Head: A System-Level Analysis between Head Movements and Corresponding Traits and Emotions
Authors:
Sharmin Akther Purabi,
Rayhan Rashed,
Md. Mirajul Islam,
Md. Nahiyan Uddin,
Mahmuda Naznin,
A. B. M. Alim Al Islam
Abstract:
Identifying physical traits and emotions based on system-sensed physical activities is a challenging problem in the realm of human-computer interaction. Our work contributes in this context by investigating an underlying connection between head movements and corresponding traits and emotions. To do so, we utilize a head movement measuring device called eSense, which gives acceleration and rotation…
▽ More
Identifying physical traits and emotions based on system-sensed physical activities is a challenging problem in the realm of human-computer interaction. Our work contributes in this context by investigating an underlying connection between head movements and corresponding traits and emotions. To do so, we utilize a head movement measuring device called eSense, which gives acceleration and rotation of a head. Here, first, we conduct a thorough study over head movement data collected from 46 persons using eSense while inducing five different emotional states over them in isolation. Our analysis reveals several new head movement based findings, which in turn, leads us to a novel unified solution for identifying different human traits and emotions through exploiting machine learning techniques over head movement data. Our analysis confirms that the proposed solution can result in high accuracy over the collected data. Accordingly, we develop an integrated unified solution for real-time emotion and trait identification using head movement data leveraging outcomes of our analysis.
△ Less
Submitted 11 October, 2019;
originally announced October 2019.
-
EdgeNet: A novel approach for Arabic numeral classification
Authors:
S. M. A. Sharif,
Ghulam Mujtaba,
S. M. Nadim Uddin
Abstract:
Despite the importance of handwritten numeral classification, a robust and effective method for a widely used language like Arabic is still due. This study focuses to overcome two major limitations of existing works: data diversity and effective learning method. Hence, the existing Arabic numeral datasets have been merged into a single dataset and augmented to introduce data diversity. Moreover, a…
▽ More
Despite the importance of handwritten numeral classification, a robust and effective method for a widely used language like Arabic is still due. This study focuses to overcome two major limitations of existing works: data diversity and effective learning method. Hence, the existing Arabic numeral datasets have been merged into a single dataset and augmented to introduce data diversity. Moreover, a novel deep model has been proposed to exploit diverse data samples of unified dataset. The proposed deep model utilizes the low-level edge features by propagating them through residual connection. To make a fair comparison with the proposed model, the existing works have been studied under the unified dataset. The comparison experiments illustrate that the unified dataset accelerates the performance of the existing works. Moreover, the proposed model outperforms the existing state-of-the-art Arabic handwritten numeral classification methods and obtain an accuracy of 99.59% in the validation phase. Apart from that, different state-of-the-art classification models have studied with the same dataset to reveal their feasibility for the Arabic numeral classification. Code available at http://github.com/sharif-apu/EdgeNet.
△ Less
Submitted 30 July, 2019;
originally announced August 2019.
-
A Constructive Equivalence between Computation Tree Logic and Failure Trace Testing
Authors:
Stefan D. Bruda,
Sunita Singh,
A. F. M. Nokib Uddin,
Zhiyu Zhang,
Rui Zuo
Abstract:
The two major systems of formal verification are model checking and algebraic model-based testing. Model checking is based on some form of temporal logic such as linear temporal logic (LTL) or computation tree logic (CTL). One powerful and realistic logic being used is CTL, which is capable of expressing most interesting properties of processes such as liveness and safety. Model-based testing is b…
▽ More
The two major systems of formal verification are model checking and algebraic model-based testing. Model checking is based on some form of temporal logic such as linear temporal logic (LTL) or computation tree logic (CTL). One powerful and realistic logic being used is CTL, which is capable of expressing most interesting properties of processes such as liveness and safety. Model-based testing is based on some operational semantics of processes (such as traces, failures, or both) and its associated preorders. The most fine-grained preorder beside bisimulation (mostly of theoretical importance) is based on failure traces. We show that these two most powerful variants are equivalent; that is, we show that for any failure trace test there exists a CTL formula equivalent to it, and the other way around. All our proofs are constructive and algorithmic. Our result allows for parts of a large system to be specified logically while other parts are specified algebraically, thus combining the best of the two (logic and algebraic) worlds.
△ Less
Submitted 30 January, 2019;
originally announced January 2019.
-
An Efficient Approach towards Mitigating Soft Errors Risks
Authors:
Muhammad Sheikh Sadi,
Md. Mizanur Rahman Khan,
Md. Nazim Uddin,
Jan Jürjens
Abstract:
Smaller feature size, higher clock frequency and lower power consumption are of core concerns of today's nano-technology, which has been resulted by continuous downscaling of CMOS technologies. The resultant 'device shrinking' reduces the soft error tolerance of the VLSI circuits, as very little energy is needed to change their states. Safety critical systems are very sensitive to soft errors. A b…
▽ More
Smaller feature size, higher clock frequency and lower power consumption are of core concerns of today's nano-technology, which has been resulted by continuous downscaling of CMOS technologies. The resultant 'device shrinking' reduces the soft error tolerance of the VLSI circuits, as very little energy is needed to change their states. Safety critical systems are very sensitive to soft errors. A bit flip due to soft error can change the value of critical variable and consequently the system control flow can completely be changed which leads to system failure. To minimize soft error risks, a novel methodology is proposed to detect and recover from soft errors considering only 'critical code blocks' and 'critical variables' rather than considering all variables and/or blocks in the whole program. The proposed method shortens space and time overhead in comparison to existing dominant approaches.
△ Less
Submitted 17 October, 2011;
originally announced October 2011.