-
ZS4C: Zero-Shot Synthesis of Compilable Code for Incomplete Code Snippets using ChatGPT
Authors:
Azmain Kabir,
Shaowei Wang,
Yuan Tian,
Tse-Hsun,
Chen,
Muhammad Asaduzzaman,
Wenbin Zhang
Abstract:
Technical question and answering (Q&A) sites such as Stack Overflow have become an important source for software developers to seek knowledge. However, code snippets on Q&A sites are usually uncompilable and semantically incomplete for compilation due to unresolved types and missing dependent libraries, which raises the obstacle for users to reuse or analyze Q&A code snippets. Prior approaches eit…
▽ More
Technical question and answering (Q&A) sites such as Stack Overflow have become an important source for software developers to seek knowledge. However, code snippets on Q&A sites are usually uncompilable and semantically incomplete for compilation due to unresolved types and missing dependent libraries, which raises the obstacle for users to reuse or analyze Q&A code snippets. Prior approaches either are not designed for synthesizing compilable code or suffer from a low compilation success rate. To address this problem, we propose ZS4C, a lightweight approach to perform zero-shot synthesis of compilable code from incomplete code snippets using Large Language Model (LLM). ZS4C operates in two stages. In the first stage, ZS4C utilizes an LLM, i.e., ChatGPT, to identify missing import statements for a given code snippet, leveraging our designed task-specific prompt template. In the second stage, ZS4C fixes compilation errors caused by incorrect import statements and syntax errors through collaborative work between ChatGPT and a compiler. We thoroughly evaluated ZS4C on a widely used benchmark called StatType-SO against the SOTA approach SnR. Compared with SnR, ZS4C improves the compilation rate from 63% to 87.6%, with a 39.3% improvement. On average, ZS4C can infer more accurate import statements than SnR, with an improvement of 6.6% in the F1.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
An improved method of delta summation for faster current value selection across filtered subsets of interval and temporal relational data
Authors:
Derek Colley,
Md Asaduzzaman
Abstract:
Aggregation in relational databases is accomplished through hashing and sorting interval data, which is computationally expensive and scales poorly as the data volumes grow.
In this paper, we show how quantitative interval and time-series data in relational attributes can be represented using delta summary values rather than absolute values. The need for sorting to determine the row correspondin…
▽ More
Aggregation in relational databases is accomplished through hashing and sorting interval data, which is computationally expensive and scales poorly as the data volumes grow.
In this paper, we show how quantitative interval and time-series data in relational attributes can be represented using delta summary values rather than absolute values. The need for sorting to determine the row corresponding to some maximum timestamp is negated, reducing the time complexity from at least O(n log(n)) towards O(n) and improving query execution times. We illustrate this new method in the relational algebra, present the implementation algorithmically, and test an implementation in two leading RDBMS products against the use of normal equivalents.
We found this delta summation technique to be most effective for use cases with additive, numerical data upon which it is necessary to frequently obtain the latest values, and where the row cardinalities are in the order of 10^5. Our experiments found the proposed new delta summation technique could execute faster than the equivalent standard selection method by up to 22.4%, reducing the overall query cost in some circumstances by up to 24.0%, reducing I/O load by up to 60.6%, but with increased query costs for write operations, an increase in CPU time and memory allocation, uncertain performance with very low or very high cardinalities and inconsistent results across different RDBMS platforms.
△ Less
Submitted 10 November, 2022;
originally announced November 2022.
-
Internet of Things (IoT) based ECG System for Rural Health Care
Authors:
Md. Obaidur Rahman,
Mohammod Abul Kashem,
Al-Akhir Nayan,
Most. Fahmida Akter,
Fazly Rabbi,
Marzia Ahmed,
Mohammad Asaduzzaman
Abstract:
Nearly 30% of the people in the rural areas of Bangladesh are below the poverty level. Moreover, due to the unavailability of modernized healthcare-related technology, nursing and diagnosis facilities are limited for rural people. Therefore, rural people are deprived of proper healthcare. In this perspective, modern technology can be facilitated to mitigate their health problems. ECG sensing tools…
▽ More
Nearly 30% of the people in the rural areas of Bangladesh are below the poverty level. Moreover, due to the unavailability of modernized healthcare-related technology, nursing and diagnosis facilities are limited for rural people. Therefore, rural people are deprived of proper healthcare. In this perspective, modern technology can be facilitated to mitigate their health problems. ECG sensing tools are interfaced with the human chest, and requisite cardiovascular data is collected through an IoT device. These data are stored in the cloud incorporates with the MQTT and HTTP servers. An innovative IoT-based method for ECG monitoring systems on cardiovascular or heart patients has been suggested in this study. The ECG signal parameters P, Q, R, S, T are collected, pre-processed, and predicted to monitor the cardiovascular conditions for further health management. The machine learning algorithm is used to determine the significance of ECG signal parameters and error rate. The logistic regression model fitted the better agreements between the train and test data. The prediction has been performed to determine the variation of PQRST quality and its suitability in the ECG Monitoring System. Considering the values of quality parameters, satisfactory results are obtained. The proposed IoT-based ECG system reduces the health care cost and complexity of cardiovascular diseases in the future.
△ Less
Submitted 26 July, 2022;
originally announced August 2022.
-
Optimal Grain Mixing is NP-Complete
Authors:
Md Asaduzzaman Noor,
Sean Yaw,
Binhai Zhu,
John W. Sheppard
Abstract:
Protein content in wheat plays a significant role when determining the price of wheat production. The Grain mixing problem aims to find the optimal bin pair combination with an appropriate mixing ratio to load each truck that will yield a maximum profit when sold to a set of local grain elevators. In this paper, we presented two complexity proofs for the grain mixing problem and showed that findin…
▽ More
Protein content in wheat plays a significant role when determining the price of wheat production. The Grain mixing problem aims to find the optimal bin pair combination with an appropriate mixing ratio to load each truck that will yield a maximum profit when sold to a set of local grain elevators. In this paper, we presented two complexity proofs for the grain mixing problem and showed that finding the optimal solutions for the grain mixing problem remains hard. These proofs follow a reduction from the $3$-dimensional matching ($3$-DM) problem and a more restricted version of the $3$-DM known as planar $3$-DM problem respectively. The complexity proofs do suggest that the exact algorithm to find the optimal solution for the grain mixing problem may be infeasible.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Fast Clustering of Short Text Streams Using Efficient Cluster Indexing and Dynamic Similarity Thresholds
Authors:
Md Rashadul Hasan Rakib,
Muhammad Asaduzzaman
Abstract:
Short text stream clustering is an important but challenging task since massive amount of text is generated from different sources such as micro-blogging, question-answering, and social news aggregation websites. One of the major challenges of clustering such massive amount of text is to cluster them within a reasonable amount of time. The existing state-of-the-art short text stream clustering met…
▽ More
Short text stream clustering is an important but challenging task since massive amount of text is generated from different sources such as micro-blogging, question-answering, and social news aggregation websites. One of the major challenges of clustering such massive amount of text is to cluster them within a reasonable amount of time. The existing state-of-the-art short text stream clustering methods can not cluster such massive amount of text within a reasonable amount of time as they compute similarities between a text and all the existing clusters to assign that text to a cluster. To overcome this challenge, we propose a fast short text stream clustering method (called FastStream) that efficiently index the clusters using inverted index and compute similarity between a text and a selected number of clusters while assigning a text to a cluster. In this way, we not only reduce the running time of our proposed method but also reduce the running time of several state-of-the-art short text stream clustering methods. FastStream assigns a text to a cluster (new or existing) using the dynamically computed similarity thresholds based on statistical measure. Thus our method efficiently deals with the concept drift problem. Experimental results demonstrate that FastStream outperforms the state-of-the-art short text stream clustering methods by a significant margin on several short text datasets. In addition, the running time of FastStream is several orders of magnitude faster than that of the state-of-the-art methods.
△ Less
Submitted 21 January, 2021;
originally announced January 2021.
-
Socio-network Analysis of RTL Designs for Hardware Trojan Localization
Authors:
Sheikh Ariful Islam,
Farha Islam Mime,
S M Asaduzzaman,
Farzana Islam
Abstract:
The recent surge in hardware security is significant due to offshoring the proprietary Intellectual property (IP). One distinct dimension of the disruptive threat is malicious logic insertion, also known as Hardware Trojan (HT). HT subverts the normal operations of a device stealthily. The diversity in HTs activation mechanisms and their location in design brings no catch-all detection techniques.…
▽ More
The recent surge in hardware security is significant due to offshoring the proprietary Intellectual property (IP). One distinct dimension of the disruptive threat is malicious logic insertion, also known as Hardware Trojan (HT). HT subverts the normal operations of a device stealthily. The diversity in HTs activation mechanisms and their location in design brings no catch-all detection techniques. In this paper, we propose to leverage principle features of social network analysis to security analysis of Register Transfer Level (RTL) designs against HT. The approach is based on investigating design properties, and it extends the current detection techniques. In particular, we perform both node- and graph-level analysis to determine the direct and indirect interactions between nets in a design. This technique helps not only in finding vulnerable nets that can act as HT triggering signals but also their interactions to influence a particular net to act as HT payload signal. We experiment the technique on 420 combinational HT instances, and on average, we can detect both triggering and payload signals with accuracy up to 97.37%.
△ Less
Submitted 21 December, 2019;
originally announced December 2019.
-
Performance Analysis of Machine Learning Techniques to Predict Diabetes Mellitus
Authors:
Md. Faisal Faruque,
Asaduzzaman,
Iqbal H. Sarker
Abstract:
Diabetes mellitus is a common disease of human body caused by a group of metabolic disorders where the sugar levels over a prolonged period is very high. It affects different organs of the human body which thus harm a large number of the body's system, in particular the blood veins and nerves. Early prediction in such disease can be controlled and save human life. To achieve the goal, this researc…
▽ More
Diabetes mellitus is a common disease of human body caused by a group of metabolic disorders where the sugar levels over a prolonged period is very high. It affects different organs of the human body which thus harm a large number of the body's system, in particular the blood veins and nerves. Early prediction in such disease can be controlled and save human life. To achieve the goal, this research work mainly explores various risk factors related to this disease using machine learning techniques. Machine learning techniques provide efficient result to extract knowledge by constructing predicting models from diagnostic medical datasets collected from the diabetic patients. Extracting knowledge from such data can be useful to predict diabetic patients. In this work, we employ four popular machine learning algorithms, namely Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN) and C4.5 Decision Tree, on adult population data to predict diabetic mellitus. Our experimental results show that C4.5 decision tree achieved higher accuracy compared to other machine learning techniques.
△ Less
Submitted 10 January, 2019;
originally announced February 2019.
-
Wi-Fi Sensing: Applications and Challenges
Authors:
A. M. Khalili,
Abdel-Hamid Soliman,
Md Asaduzzaman,
Alison Griffiths
Abstract:
Wi-Fi technology has strong potentials in indoor and outdoor sensing applications, it has several important features which makes it an appealing option compared to other sensing technologies. This paper presents a survey on different applications of Wi-Fi based sensing systems such as elderly people monitoring, activity classification, gesture recognition, people counting, through the wall sensing…
▽ More
Wi-Fi technology has strong potentials in indoor and outdoor sensing applications, it has several important features which makes it an appealing option compared to other sensing technologies. This paper presents a survey on different applications of Wi-Fi based sensing systems such as elderly people monitoring, activity classification, gesture recognition, people counting, through the wall sensing, behind the corner sensing, and many other applications. The challenges and interesting future directions are also highlighted.
△ Less
Submitted 31 December, 2023; v1 submitted 2 January, 2019;
originally announced January 2019.
-
Speech Enhancement in Adverse Environments Based on Non-stationary Noise-driven Spectral Subtraction and SNR-dependent Phase Compensation
Authors:
Md Tauhidul Islam,
Asaduzzaman,
Celia Shahnaz,
Wei-** Zhu,
M. Omair Ahmad
Abstract:
A two-step enhancement method based on spectral subtraction and phase spectrum compensation is presented in this paper for noisy speeches in adverse environments involving non-stationary noise and medium to low levels of SNR. The magnitude of the noisy speech spectrum is modified in the first step of the proposed method by a spectral subtraction approach, where a new noise estimation method based…
▽ More
A two-step enhancement method based on spectral subtraction and phase spectrum compensation is presented in this paper for noisy speeches in adverse environments involving non-stationary noise and medium to low levels of SNR. The magnitude of the noisy speech spectrum is modified in the first step of the proposed method by a spectral subtraction approach, where a new noise estimation method based on the low frequency information of the noisy speech is introduced. We argue that this method of noise estimation is capable of estimating the non-stationary noise accurately. The phase spectrum of the noisy speech is modified in the second step consisting of phase spectrum compensation, where an SNR-dependent approach is incorporated to determine the amount of compensation to be imposed on the phase spectrum. A modified complex spectrum is obtained by aggregating the magnitude from the spectral subtraction step and modified phase spectrum from the phase compensation step, which is found to be a better representation of enhanced speech spectrum. Speech files available in the NOIZEUS database are used to carry extensive simulations for evaluation of the proposed method.
△ Less
Submitted 18 February, 2018;
originally announced March 2018.
-
A Generalized Loss Network Model with Overflow for Capacity Planning of a Perinatal Network
Authors:
Md Asaduzzaman,
Thierry J Chaussalet
Abstract:
We develop a generalized loss network framework for capacity planning of a perinatal network in the UK. Decomposing the network by hospitals, each unit is analyzed with a GI/G/c/0 overflow loss network model. A two-moment approximation is performed to obtain the steady state solution of the GI/G/c/0 loss systems, and expressions for rejection probability and overflow probability have been derived.…
▽ More
We develop a generalized loss network framework for capacity planning of a perinatal network in the UK. Decomposing the network by hospitals, each unit is analyzed with a GI/G/c/0 overflow loss network model. A two-moment approximation is performed to obtain the steady state solution of the GI/G/c/0 loss systems, and expressions for rejection probability and overflow probability have been derived. Using the model framework, the number of required cots can be estimated based on the rejection probability at each level of care of the neonatal units in a network. The generalization ensures that the model can be applied to any perinatal network for renewal arrival and discharge processes.
△ Less
Submitted 24 November, 2011; v1 submitted 29 October, 2011;
originally announced October 2011.
-
Towards a decentralized algorithm for map** network and computational resources for distributed data-flow computations
Authors:
Shah Asaduzzaman,
Muthucumaru Maheswaran
Abstract:
Several high-throughput distributed data-processing applications require multi-hop processing of streams of data. These applications include continual processing on data streams originating from a network of sensors, composing a multimedia stream through embedding several component streams originating from different locations, etc. These data-flow computing applications require multiple processi…
▽ More
Several high-throughput distributed data-processing applications require multi-hop processing of streams of data. These applications include continual processing on data streams originating from a network of sensors, composing a multimedia stream through embedding several component streams originating from different locations, etc. These data-flow computing applications require multiple processing nodes interconnected according to the data-flow topology of the application, for on-stream processing of the data. Since the applications usually sustain for a long period, it is important to optimally map the component computations and communications on the nodes and links in the network, fulfilling the capacity constraints and optimizing some quality metric such as end-to-end latency. The map** problem is unfortunately NP-complete and heuristics have been previously proposed to compute the approximate solution in a centralized way. However, because of the dynamicity of the network, it is practically impossible to aggregate the correct state of the whole network in a single node. In this paper, we present a distributed algorithm for optimal map** of the components of the data flow applications. We propose several heuristics to minimize the message complexity of the algorithm while maintaining the quality of the solution.
△ Less
Submitted 25 March, 2009;
originally announced March 2009.
-
Using Dedicated and Opportunistic Networks in Synergy for a Cost-effective Distributed Stream Processing Platform
Authors:
Shah Asaduzzaman,
Muthucumaru Maheswaran
Abstract:
This paper presents a case for exploiting the synergy of dedicated and opportunistic network resources in a distributed hosting platform for data stream processing applications. Our previous studies have demonstrated the benefits of combining dedicated reliable resources with opportunistic resources in case of high-throughput computing applications, where timely allocation of the processing unit…
▽ More
This paper presents a case for exploiting the synergy of dedicated and opportunistic network resources in a distributed hosting platform for data stream processing applications. Our previous studies have demonstrated the benefits of combining dedicated reliable resources with opportunistic resources in case of high-throughput computing applications, where timely allocation of the processing units is the primary concern. Since distributed stream processing applications demand large volume of data transmission between the processing sites at a consistent rate, adequate control over the network resources is important here to assure a steady flow of processing. In this paper, we propose a system model for the hybrid hosting platform where stream processing servers installed at distributed sites are interconnected with a combination of dedicated links and public Internet. Decentralized algorithms have been developed for allocation of the two classes of network resources among the competing tasks with an objective towards higher task throughput and better utilization of expensive dedicated resources. Results from extensive simulation study show that with proper management, systems exploiting the synergy of dedicated and opportunistic resources yield considerably higher task throughput and thus, higher return on investment over the systems solely using expensive dedicated resources.
△ Less
Submitted 25 March, 2009;
originally announced March 2009.
-
CliqueStream: an efficient and fault-resilient live streaming network on a clustered peer-to-peer overlay
Authors:
Shah Asaduzzaman,
Ying Qiao,
Gregor v. Bochmann
Abstract:
Several overlay-based live multimedia streaming platforms have been proposed in the recent peer-to-peer streaming literature. In most of the cases, the overlay neighbors are chosen randomly for robustness of the overlay. However, this causes nodes that are distant in terms of proximity in the underlying physical network to become neighbors, and thus data travels unnecessary distances before reac…
▽ More
Several overlay-based live multimedia streaming platforms have been proposed in the recent peer-to-peer streaming literature. In most of the cases, the overlay neighbors are chosen randomly for robustness of the overlay. However, this causes nodes that are distant in terms of proximity in the underlying physical network to become neighbors, and thus data travels unnecessary distances before reaching the destination. For efficiency of bulk data transmission like multimedia streaming, the overlay neighborhood should resemble the proximity in the underlying network. In this paper, we exploit the proximity and redundancy properties of a recently proposed clique-based clustered overlay network, named eQuus, to build efficient as well as robust overlays for multimedia stream dissemination. To combine the efficiency of content pushing over tree structured overlays and the robustness of data-driven mesh overlays, higher capacity stable nodes are organized in tree structure to carry the long haul traffic and less stable nodes with intermittent presence are organized in localized meshes. The overlay construction and fault-recovery procedures are explained in details. Simulation study demonstrates the good locality properties of the platform. The outage time and control overhead induced by the failure recovery mechanism are minimal as demonstrated by the analysis.
△ Less
Submitted 26 March, 2009; v1 submitted 25 March, 2009;
originally announced March 2009.
-
Overlay Structure for Large Scale Content Sharing: Leveraging Geography as the Basis for Routing Locality
Authors:
Shah Asaduzzaman,
Gregor v. Bochmann
Abstract:
In this paper we place our arguments on two related issues in the design of generalized structured peer-to-peer overlays. First, we argue that for the large-scale content-sharing applications, lookup and content transport functions need to be treated separately. Second, to create a location-based routing overlay suitable for content sharing and other applications, we argue that off-the-shelf geo…
▽ More
In this paper we place our arguments on two related issues in the design of generalized structured peer-to-peer overlays. First, we argue that for the large-scale content-sharing applications, lookup and content transport functions need to be treated separately. Second, to create a location-based routing overlay suitable for content sharing and other applications, we argue that off-the-shelf geographic coordinates of Internet-connected hosts can be used as a basis. We then outline the design principles and present a design for the generalized routing overlay based on adaptive hierarchical partitioning of the geographical space.
△ Less
Submitted 24 March, 2009;
originally announced March 2009.
-
Decentralized Management of Bi-modal Network Resources in a Distributed Stream Processing Platform
Authors:
Shah Asaduzzaman,
Muthucumaru Maheswaran
Abstract:
This paper presents resource management techniques for allocating communication and computational resources in a distributed stream processing platform. The platform is designed to exploit the synergy of two classes of network connections -- dedicated and opportunistic. Previous studies we conducted have demonstrated the benefits of such bi-modal resource organization that combines small pools o…
▽ More
This paper presents resource management techniques for allocating communication and computational resources in a distributed stream processing platform. The platform is designed to exploit the synergy of two classes of network connections -- dedicated and opportunistic. Previous studies we conducted have demonstrated the benefits of such bi-modal resource organization that combines small pools of dedicated computers with a very large pool of opportunistic computing capacities of idle computers to serve high throughput computing applications. This paper extends the idea of bi-modal resource organization into the management of communication resources. Since distributed stream processing applications demand large volume of data transmission between processing sites at a consistent rate, adequate control over the network resources is important to assure a steady flow of processing. The system model used in this paper is a platform where stream processing servers at distributed sites are interconnected with a combination of dedicated and opportunistic communication links. Two pertinent resource allocation problems are analyzed in details and solved using decentralized algorithms. One is the map** of the stream processing tasks on the processing and the communication resources. The other is the adaptive re-allocation of the opportunistic communication links due to the variations in their capacities. Overall optimization goal is higher task throughput and better utilization of the expensive dedicated links. The evaluation demonstrates that the algorithms are able to exploit the synergy of bi-modal communication links towards achieving the optimization goals.
△ Less
Submitted 24 March, 2009;
originally announced March 2009.
-
GeoP2P: An adaptive peer-to-peer overlay for efficient search and update of spatial information
Authors:
Shah Asaduzzaman,
Gregor v. Bochmann
Abstract:
This paper proposes a fully decentralized peer-to-peer overlay structure GeoP2P, to facilitate geographic location based search and retrieval of information. Certain limitations of centralized geographic indexes favor peer-to-peer organization of the information, which, in addition to avoiding performance bottleneck, allows autonomy over local information. Peer-to-peer systems for geographic or…
▽ More
This paper proposes a fully decentralized peer-to-peer overlay structure GeoP2P, to facilitate geographic location based search and retrieval of information. Certain limitations of centralized geographic indexes favor peer-to-peer organization of the information, which, in addition to avoiding performance bottleneck, allows autonomy over local information. Peer-to-peer systems for geographic or multidimensional range queries built on existing DHTs suffer from the inaccuracy in linearization of the multidimensional space. Other overlay structures that are based on hierarchical partitioning of the search space are not scalable because they use special super-peers to represent the nodes in the hierarchy. GeoP2P partitions the search space hierarchically, maintains the overlay structure and performs the routing without the need of any super-peers. Although similar fully-decentralized overlays have been previously proposed, they lack the ability to dynamically grow and retract the partition hierarchy when the number of peers change. GeoP2P provides such adaptive features with minimum perturbation of the system state. Such adaptation makes both the routing delay and the state size of each peer logarithmic to the total number of peers, irrespective of the size of the multidimensional space. Our analysis also reveals that the overlay structure and the routing algorithm are generic and independent of several aspects of the partitioning hierarchy, such as the geometric shape of the zones or the dimensionality of the search space.
△ Less
Submitted 22 March, 2009;
originally announced March 2009.