-
Permissioned Blockchain-based Framework for Ranking Synthetic Data Generators
Authors:
Narasimha Raghavan Veeraragavan,
Mohammad Hossein Tabatabaei,
Severin Elvatun,
Vibeke Binz Vallevik,
Siri Larønningen,
Jan F Nygård
Abstract:
Synthetic data generation is increasingly recognized as a crucial solution to address data related challenges such as scarcity, bias, and privacy concerns. As synthetic data proliferates, the need for a robust evaluation framework to select a synthetic data generator becomes more pressing given the variety of options available. In this research study, we investigate two primary questions: 1) How c…
▽ More
Synthetic data generation is increasingly recognized as a crucial solution to address data related challenges such as scarcity, bias, and privacy concerns. As synthetic data proliferates, the need for a robust evaluation framework to select a synthetic data generator becomes more pressing given the variety of options available. In this research study, we investigate two primary questions: 1) How can we select the most suitable synthetic data generator from a set of options for a specific purpose? 2) How can we make the selection process more transparent, accountable, and auditable? To address these questions, we introduce a novel approach in which the proposed ranking algorithm is implemented as a smart contract within a permissioned blockchain framework called Sawtooth. Through comprehensive experiments and comparisons with state-of-the-art baseline ranking solutions, our framework demonstrates its effectiveness in providing nuanced rankings that consider both desirable and undesirable properties. Furthermore, our framework serves as a valuable tool for selecting the optimal synthetic data generators for specific needs while ensuring compliance with data protection principles.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Turning hazardous volatile matter compounds into fuel by catalytic steam reforming: An evolutionary machine learning approach
Authors:
Alireza Shafizadeh,
Hossein Shahbeik,
Mohammad Hossein Nadian,
Vijai Kumar Gupta,
Abdul-Sattar Nizami,
Su Shiung Lam,
Wanxi Peng,
Junting Pan,
Meisam Tabatabaei,
Mortaza Aghbashlo
Abstract:
Chemical and biomass processing systems release volatile matter compounds into the environment daily. Catalytic reforming can convert these compounds into valuable fuels, but develo** stable and efficient catalysts is challenging. Machine learning can handle complex relationships in big data and optimize reaction conditions, making it an effective solution for addressing the mentioned issues. Th…
▽ More
Chemical and biomass processing systems release volatile matter compounds into the environment daily. Catalytic reforming can convert these compounds into valuable fuels, but develo** stable and efficient catalysts is challenging. Machine learning can handle complex relationships in big data and optimize reaction conditions, making it an effective solution for addressing the mentioned issues. This study is the first to develop a machine-learning-based research framework for modeling, understanding, and optimizing the catalytic steam reforming of volatile matter compounds. Toluene catalytic steam reforming is used as a case study to show how chemical/textural analyses (e.g., X-ray diffraction analysis) can be used to obtain input features for machine learning models. Literature is used to compile a database covering a variety of catalyst characteristics and reaction conditions. The process is thoroughly analyzed, mechanistically discussed, modeled by six machine learning models, and optimized using the particle swarm optimization algorithm. Ensemble machine learning provides the best prediction performance (R2 > 0.976) for toluene conversion and product distribution. The optimal tar conversion (higher than 77.2%) is obtained at temperatures between 637.44 and 725.62 °C, with a steam-to-carbon molar ratio of 5.81-7.15 and a catalyst BET surface area 476.03-638.55 m2/g. The feature importance analysis satisfactorily reveals the effects of input descriptors on model prediction. Operating conditions (50.9%) and catalyst properties (49.1%) are equally important in modeling. The developed framework can expedite the search for optimal catalyst characteristics and reaction conditions, not only for catalytic chemical processing but also for related research areas.
△ Less
Submitted 25 July, 2023;
originally announced August 2023.
-
Using evolutionary machine learning to characterize and optimize co-pyrolysis of biomass feedstocks and polymeric wastes
Authors:
Hossein Shahbeik,
Alireza Shafizadeh,
Mohammad Hossein Nadian,
Dorsa Jeddi,
Seyedali Mirjalili,
Yadong Yang,
Su Shiung Lam,
Junting Pan,
Meisam Tabatabaei,
Mortaza Aghbashlo
Abstract:
Co-pyrolysis of biomass feedstocks with polymeric wastes is a promising strategy for improving the quantity and quality parameters of the resulting liquid fuel. Numerous experimental measurements are typically conducted to find the optimal operating conditions. However, performing co-pyrolysis experiments is highly challenging due to the need for costly and lengthy procedures. Machine learning (ML…
▽ More
Co-pyrolysis of biomass feedstocks with polymeric wastes is a promising strategy for improving the quantity and quality parameters of the resulting liquid fuel. Numerous experimental measurements are typically conducted to find the optimal operating conditions. However, performing co-pyrolysis experiments is highly challenging due to the need for costly and lengthy procedures. Machine learning (ML) provides capabilities to cope with such issues by leveraging on existing data. This work aims to introduce an evolutionary ML approach to quantify the (by)products of the biomass-polymer co-pyrolysis process. A comprehensive dataset covering various biomass-polymer mixtures under a broad range of process conditions is compiled from the qualified literature. The database was subjected to statistical analysis and mechanistic discussion. The input features are constructed using an innovative approach to reflect the physics of the process. The constructed features are subjected to principal component analysis to reduce their dimensionality. The obtained scores are introduced into six ML models. Gaussian process regression model tuned by particle swarm optimization algorithm presents better prediction performance (R2 > 0.9, MAE < 0.03, and RMSE < 0.06) than other developed models. The multi-objective particle swarm optimization algorithm successfully finds optimal independent parameters.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Machine learning-based characterization of hydrochar from biomass: Implications for sustainable energy and material production
Authors:
Alireza Shafizadeh,
Hossein Shahbeik,
Shahin Rafiee,
Aysooda Moradi,
Mohammadreza Shahbaz,
Meysam Madadi,
Cheng Li,
Wanxi Peng,
Meisam Tabatabaei,
Mortaza Aghbashlo
Abstract:
Hydrothermal carbonization (HTC) is a process that converts biomass into versatile hydrochar without the need for prior drying. The physicochemical properties of hydrochar are influenced by biomass properties and processing parameters, making it challenging to optimize for specific applications through trial-and-error experiments. To save time and money, machine learning can be used to develop a m…
▽ More
Hydrothermal carbonization (HTC) is a process that converts biomass into versatile hydrochar without the need for prior drying. The physicochemical properties of hydrochar are influenced by biomass properties and processing parameters, making it challenging to optimize for specific applications through trial-and-error experiments. To save time and money, machine learning can be used to develop a model that characterizes hydrochar produced from different biomass sources under varying reaction processing parameters. Thus, this study aims to develop an inclusive model to characterize hydrochar using a database covering a range of biomass types and reaction processing parameters. The quality and quantity of hydrochar are predicted using two models (decision tree regression and support vector regression). The decision tree regression model outperforms the support vector regression model in terms of forecast accuracy (R2 > 0.88, RMSE < 6.848, and MAE < 4.718). Using an evolutionary algorithm, optimum inputs are identified based on cost functions provided by the selected model to optimize hydrochar for energy production, soil amendment, and pollutant adsorption, resulting in hydrochar yields of 84.31%, 84.91%, and 80.40%, respectively. The feature importance analysis reveals that biomass ash/carbon content and operating temperature are the primary factors affecting hydrochar production in the HTC process.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Playing to Learn, or to Keep Secret: Alternating-Time Logic Meets Information Theory
Authors:
Masoud Tabatabaei,
Wojciech Jamroga
Abstract:
Many important properties of multi-agent systems refer to the participants' ability to achieve a given goal, or to prevent the system from an undesirable event. Among intelligent agents, the goals are often of epistemic nature, i.e., concern the ability to obtain knowledge about an important fact φ. Such properties can be e.g. expressed in ATLK, that is, alternating-time temporal logic ATL extende…
▽ More
Many important properties of multi-agent systems refer to the participants' ability to achieve a given goal, or to prevent the system from an undesirable event. Among intelligent agents, the goals are often of epistemic nature, i.e., concern the ability to obtain knowledge about an important fact φ. Such properties can be e.g. expressed in ATLK, that is, alternating-time temporal logic ATL extended with epistemic operators. In many realistic scenarios, however, players do not need to fully learn the truth value of φ. They may be almost as well off by gaining some knowledge; in other words, by reducing their uncertainty about φ. Similarly, in order to keep φsecret, it is often insufficient that the intruder never fully learns its truth value. Instead, one needs to require that his uncertainty about φnever drops below a reasonable threshold.
With this motivation in mind, we introduce the logic ATLH, extending ATL with quantitative modalities based on the Hartley measure of uncertainty. The new logic enables to specify agents' abilities w.r.t. the uncertainty of a given player about a given set of statements. It turns out that ATLH has the same expressivity and model checking complexity as ATLK. However, the new logic is exponentially more succinct than ATLK, which is the main technical result of this paper.
△ Less
Submitted 18 October, 2023; v1 submitted 28 February, 2023;
originally announced March 2023.
-
Understanding blockchain: definitions, architecture, design, and system comparison
Authors:
Mohammad Hossein Tabatabaei,
Roman Vitenberg,
Narasimha Raghavan Veeraragavan
Abstract:
The explosive advent of the blockchain technology has led to hundreds of blockchain systems in the industry, thousands of academic papers published over the last few years, and an even larger number of new initiatives and projects. Despite the emerging consolidation efforts, the area remains highly turbulent without systematization, educational materials, or cross-system comparative analysis.
In…
▽ More
The explosive advent of the blockchain technology has led to hundreds of blockchain systems in the industry, thousands of academic papers published over the last few years, and an even larger number of new initiatives and projects. Despite the emerging consolidation efforts, the area remains highly turbulent without systematization, educational materials, or cross-system comparative analysis.
In this paper, we provide a systematic and comprehensive study of four popular yet widely different blockchain systems: Bitcoin, Ethereum, Hyperledger Fabric, and IOTA. The study is presented as a cross-system comparison, which is organized by clearly identified aspects: definitions, roles of the participants, entities, and the characteristics and design of each of the commonly used layers in the cross-system blockchain architecture. Our exploration goes deeper compared to what is currently available in academic surveys and tutorials. For example, we provide the first extensive coverage of the storage layer in Ethereum and the most comprehensive explanation of the consensus protocol in IOTA. The exposition is due to the consolidation of fragmented information gathered from white and yellow papers, academic publications, blogs, developer documentation, communication with the developers, as well as additional analysis gleaned from the source code. We hope that this survey will help the readers gain in-depth understanding of the design principles behind blockchain systems and contribute towards systematization of the area.
△ Less
Submitted 26 July, 2023; v1 submitted 5 July, 2022;
originally announced July 2022.
-
Automated lung segmentation from CT images of normal and COVID-19 pneumonia patients
Authors:
Faeze Gholamiankhah,
Samaneh Mostafapour,
Nouraddin Abdi Goushbolagh,
Seyedjafar Shojaerazavi,
Parvaneh Layegh,
Seyyed Mohammad Tabatabaei,
Hossein Arabi
Abstract:
Automated semantic image segmentation is an essential step in quantitative image analysis and disease diagnosis. This study investigates the performance of a deep learning-based model for lung segmentation from CT images for normal and COVID-19 patients. Chest CT images and corresponding lung masks of 1200 confirmed COVID-19 cases were used for training a residual neural network. The reference lun…
▽ More
Automated semantic image segmentation is an essential step in quantitative image analysis and disease diagnosis. This study investigates the performance of a deep learning-based model for lung segmentation from CT images for normal and COVID-19 patients. Chest CT images and corresponding lung masks of 1200 confirmed COVID-19 cases were used for training a residual neural network. The reference lung masks were generated through semi-automated/manual segmentation of the CT images. The performance of the model was evaluated on two distinct external test datasets including 120 normal and COVID-19 subjects, and the results of these groups were compared to each other. Different evaluation metrics such as dice coefficient (DSC), mean absolute error (MAE), relative mean HU difference, and relative volume difference were calculated to assess the accuracy of the predicted lung masks. The proposed deep learning method achieved DSC of 0.980 and 0.971 for normal and COVID-19 subjects, respectively, demonstrating significant overlap between predicted and reference lung masks. Moreover, MAEs of 0.037 HU and 0.061 HU, relative mean HU difference of -2.679% and -4.403%, and relative volume difference of 2.405% and 5.928% were obtained for normal and COVID-19 subjects, respectively. The comparable performance in lung segmentation of the normal and COVID-19 patients indicates the accuracy of the model for the identification of the lung tissue in the presence of the COVID-19 induced infections (though slightly better performance was observed for normal patients). The promising results achieved by the proposed deep learning-based model demonstrated its reliability in COVID-19 lung segmentation. This prerequisite step would lead to a more efficient and robust pneumonia lesion analysis.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Mutually Uncorrelated Primers for DNA-Based Data Storage
Authors:
S. M. Hossein Tabatabaei Yazdi,
Han Mao Kiah,
Ryan Gabrys,
Olgica Milenkovic
Abstract:
We introduce the notion of weakly mutually uncorrelated (WMU) sequences, motivated by applications in DNA-based data storage systems and for synchronization of communication devices. WMU sequences are characterized by the property that no sufficiently long suffix of one sequence is the prefix of the same or another sequence. WMU sequences used for primer design in DNA-based data storage systems ar…
▽ More
We introduce the notion of weakly mutually uncorrelated (WMU) sequences, motivated by applications in DNA-based data storage systems and for synchronization of communication devices. WMU sequences are characterized by the property that no sufficiently long suffix of one sequence is the prefix of the same or another sequence. WMU sequences used for primer design in DNA-based data storage systems are also required to be at large mutual Hamming distance from each other, have balanced compositions of symbols, and avoid primer-dimer byproducts. We derive bounds on the size of WMU and various constrained WMU codes and present a number of constructions for balanced, error-correcting, primer-dimer free WMU codes using Dyck paths, prefix-synchronized and cyclic codes.
△ Less
Submitted 13 September, 2017;
originally announced September 2017.
-
Information Security as Strategic (In)effectivity
Authors:
Wojciech Jamroga,
Masoud Tabatabaei
Abstract:
Security of information flow is commonly understood as preventing any information leakage, regardless of how grave or harmless consequences the leakage can have. In this work, we suggest that information security is not a goal in itself, but rather a means of preventing potential attackers from compromising the correct behavior of the system. To formalize this, we first show how two information fl…
▽ More
Security of information flow is commonly understood as preventing any information leakage, regardless of how grave or harmless consequences the leakage can have. In this work, we suggest that information security is not a goal in itself, but rather a means of preventing potential attackers from compromising the correct behavior of the system. To formalize this, we first show how two information flows can be compared by looking at the adversary's ability to harm the system. Then, we propose that the information flow in a system is effectively information-secure if it does not allow for more harm than its idealized variant based on the classical notion of noninterference.
△ Less
Submitted 7 August, 2016;
originally announced August 2016.
-
Weakly Mutually Uncorrelated Codes
Authors:
S. M. Hossein Tabatabaei Yazdi,
Han Mao Kiah,
Olgica Milenkovic
Abstract:
We introduce the notion of weakly mutually uncorrelated (WMU) sequences, motivated by applications in DNA-based storage systems and synchronization protocols. WMU sequences are characterized by the property that no sufficiently long suffix of one sequence is the prefix of the same or another sequence. In addition, WMU sequences used in DNA-based storage systems are required to have balanced compos…
▽ More
We introduce the notion of weakly mutually uncorrelated (WMU) sequences, motivated by applications in DNA-based storage systems and synchronization protocols. WMU sequences are characterized by the property that no sufficiently long suffix of one sequence is the prefix of the same or another sequence. In addition, WMU sequences used in DNA-based storage systems are required to have balanced compositions of symbols and to be at large mutual Hamming distance from each other. We present a number of constructions for balanced, error-correcting WMU codes using Dyck paths, Knuth's balancing principle, prefix synchronized and cyclic codes.
△ Less
Submitted 29 January, 2016;
originally announced January 2016.
-
DNA-Based Storage: Trends and Methods
Authors:
S. M. Hossein Tabatabaei Yazdi,
Han Mao Kiah,
Eva Ruiz Garcia,
Jian Ma,
Huimin Zhao,
Olgica Milenkovic
Abstract:
We provide an overview of current approaches to DNA-based storage system design and accompanying synthesis, sequencing and editing methods. We also introduce and analyze a suite of new constrained coding schemes for both archival and random access DNA storage channels. The mathematical basis of our work is the construction and design of sequences over discrete alphabets that avoid pre-specified ad…
▽ More
We provide an overview of current approaches to DNA-based storage system design and accompanying synthesis, sequencing and editing methods. We also introduce and analyze a suite of new constrained coding schemes for both archival and random access DNA storage channels. The mathematical basis of our work is the construction and design of sequences over discrete alphabets that avoid pre-specified address patterns, have balanced base content, and exhibit other relevant substring constraints. These schemes adapt the stored signals to the DNA medium and thereby reduce the inherent error-rate of the system.
△ Less
Submitted 6 July, 2015;
originally announced July 2015.
-
A Rewritable, Random-Access DNA-Based Storage System
Authors:
S. M. Hossein Tabatabaei Yazdi,
Yongbo Yuan,
Jian Ma,
Huimin Zhao,
Olgica Milenkovic
Abstract:
We describe the first DNA-based storage architecture that enables random access to data blocks and rewriting of information stored at arbitrary locations within the blocks. The newly developed architecture overcomes drawbacks of existing read-only methods that require decoding the whole file in order to read one data fragment. Our system is based on new constrained coding techniques and accompanyi…
▽ More
We describe the first DNA-based storage architecture that enables random access to data blocks and rewriting of information stored at arbitrary locations within the blocks. The newly developed architecture overcomes drawbacks of existing read-only methods that require decoding the whole file in order to read one data fragment. Our system is based on new constrained coding techniques and accompanying DNA editing methods that ensure data reliability, specificity and sensitivity of access, and at the same time provide exceptionally high data storage capacity. As a proof of concept, we encoded parts of the Wikipedia pages of six universities in the USA, and selected and edited parts of the text written in DNA corresponding to three of these schools. The results suggest that DNA is a versatile media suitable for both ultrahigh density archival and rewritable storage applications.
△ Less
Submitted 8 May, 2015;
originally announced May 2015.
-
On the Relationships among Optimal Symmetric Fix-Free Codes
Authors:
S. M. Hossein Tabatabaei Yazdi,
Serap A. Savari
Abstract:
Symmetric fix-free codes are prefix condition codes in which each codeword is required to be a palindrome. Their study is motivated by the topic of joint source-channel coding. Although they have been considered by a few communities they are not well understood. In earlier work we used a collection of instances of Boolean satisfiability problems as a tool in the generation of all optimal binary sy…
▽ More
Symmetric fix-free codes are prefix condition codes in which each codeword is required to be a palindrome. Their study is motivated by the topic of joint source-channel coding. Although they have been considered by a few communities they are not well understood. In earlier work we used a collection of instances of Boolean satisfiability problems as a tool in the generation of all optimal binary symmetric fix-free codes with n codewords and observed that the number of different optimal codelength sequences grows slowly compared with the corresponding number for prefix condition codes. We demonstrate that all optimal symmetric fix-free codes can alternatively be obtained by sequences of codes generated by simple manipulations starting from one particular code. We also discuss simplifications in the process of searching for this set of codes.
△ Less
Submitted 12 November, 2012;
originally announced November 2012.
-
A Deterministic Polynomial-Time Protocol for Synchronizing from Deletions
Authors:
S. M. Sadegh Tabatabaei Yazdi,
Lara Dolecek
Abstract:
In this paper, we consider a synchronization problem between nodes $A$ and $B$ that are connected through a two--way communication channel. {Node $A$} contains a binary file $X$ of length $n$ and {node $B$} contains a binary file $Y$ that is generated by randomly deleting bits from $X$, by a small deletion rate $β$. The location of deleted bits is not known to either node $A$ or node $B$. We offer…
▽ More
In this paper, we consider a synchronization problem between nodes $A$ and $B$ that are connected through a two--way communication channel. {Node $A$} contains a binary file $X$ of length $n$ and {node $B$} contains a binary file $Y$ that is generated by randomly deleting bits from $X$, by a small deletion rate $β$. The location of deleted bits is not known to either node $A$ or node $B$. We offer a deterministic synchronization scheme between nodes $A$ and $B$ that needs a total of $O(nβ\log \frac{1}β)$ transmitted bits and reconstructs $X$ at node $B$ with probability of error that is exponentially low in the size of $X$. Orderwise, the rate of our scheme matches the optimal rate for this channel.
△ Less
Submitted 21 August, 2013; v1 submitted 2 July, 2012;
originally announced July 2012.
-
A Deterministic Polynomial--Time Algorithm for Constructing a Multicast Coding Scheme for Linear Deterministic Relay Networks
Authors:
S. M. Sadegh Tabatabaei Yazdi,
Serap A. Savari
Abstract:
We propose a new way to construct a multicast coding scheme for linear deterministic relay networks. Our construction can be regarded as a generalization of the well-known multicast network coding scheme of Jaggi et al. to linear deterministic relay networks and is based on the notion of flow for a unicast session that was introduced by the authors in earlier work. We present randomized and determ…
▽ More
We propose a new way to construct a multicast coding scheme for linear deterministic relay networks. Our construction can be regarded as a generalization of the well-known multicast network coding scheme of Jaggi et al. to linear deterministic relay networks and is based on the notion of flow for a unicast session that was introduced by the authors in earlier work. We present randomized and deterministic polynomial--time versions of our algorithm and show that for a network with $g$ destinations, our deterministic algorithm can achieve the capacity in $\left\lceil \log(g+1)\right\rceil $ uses of the network.
△ Less
Submitted 14 January, 2011;
originally announced January 2011.
-
On Describing the Routing Capacity Regions of Networks
Authors:
Ali Kakhbod,
S. M. Sadegh Tabatabaei Yazdi
Abstract:
The routing capacity region of networks with multiple unicast sessions can be characterized using Farkas' lemma as an infinite set of linear inequalities. In this paper this result is sharpened by exploiting properties of the solution satisfied by each rate-tuple on the boundary of the capacity region, and a finite description of the routing capacity region which depends on network parameters is o…
▽ More
The routing capacity region of networks with multiple unicast sessions can be characterized using Farkas' lemma as an infinite set of linear inequalities. In this paper this result is sharpened by exploiting properties of the solution satisfied by each rate-tuple on the boundary of the capacity region, and a finite description of the routing capacity region which depends on network parameters is offered. For the special case of undirected ring networks additional results on the complexity of the description are provided.
△ Less
Submitted 10 February, 2012; v1 submitted 7 April, 2010;
originally announced April 2010.
-
A Combinatorial Study of Linear Deterministic Relay Networks
Authors:
S. M. Sadegh Tabatabaei Yazdi,
Serap A. Savari
Abstract:
In the last few years the so--called "linear deterministic" model of relay channels has gained popularity as a means of studying the flow of information over wireless communication networks, and this approach generalizes the model of wireline networks which is standard in network optimization. There is recent work extending the celebrated max--flow/min--cut theorem to the capacity of a unicast s…
▽ More
In the last few years the so--called "linear deterministic" model of relay channels has gained popularity as a means of studying the flow of information over wireless communication networks, and this approach generalizes the model of wireline networks which is standard in network optimization. There is recent work extending the celebrated max--flow/min--cut theorem to the capacity of a unicast session over a linear deterministic relay network which is modeled by a layered directed graph. This result was first proved by a random coding scheme over large blocks of transmitted signals. We demonstrate the same result with a simple, deterministic, polynomial--time algorithm which takes as input a single transmitted signal instead of a long block of signals. Our capacity-achieving transmission scheme for a two--layer network requires the extension of a one--dimensional Rado--Hall transversal theorem on the independent subsets of rows of a row--partitioned matrix into a two--dimensional variation for block matrices. To generalize our approach to larger networks we use the submodularity of the capacity of a cut for our model and show that our complete transmission scheme can be obtained by solving a linear program over the intersection of two polymatroids. We prove that our transmission scheme can achieve the max-flow/min-cut capacity by applying a theorem of Edmonds about such linear programs. We use standard submodular function minimization techniques as part of our polynomial--time algorithm to construct our capacity-achieving transmission scheme.
△ Less
Submitted 15 April, 2009;
originally announced April 2009.