-
FastMig: Leveraging FastFreeze to Establish Robust Service Liquidity in Cloud 2.0
Authors:
Sorawit Manatura,
Thanawat Chanikaphon,
Chantana Chantrapornchai,
Mohsen Amini Salehi
Abstract:
Service liquidity across edge-to-cloud or multi-cloud will serve as the cornerstone of the next generation of cloud computing systems (Cloud 2.0). Provided that cloud-based services are predominantly containerized, an efficient and robust live container migration solution is required to accomplish service liquidity. In a nod to this growing requirement, in this research, we leverage FastFreeze, a…
▽ More
Service liquidity across edge-to-cloud or multi-cloud will serve as the cornerstone of the next generation of cloud computing systems (Cloud 2.0). Provided that cloud-based services are predominantly containerized, an efficient and robust live container migration solution is required to accomplish service liquidity. In a nod to this growing requirement, in this research, we leverage FastFreeze, a popular platform for process checkpoint/restore within a container, and promote it to be a robust solution for end-to-end live migration of containerized services. In particular, we develop a new platform, called FastMig that proactively controls the checkpoint/restore operations of FastFreeze, thereby, allowing for robust live migration of containerized services via standard HTTP interfaces. The proposed platform introduces post-checkpointing and pre-restoration operations to enhance migration robustness. Notably, the pre-restoration operation includes containerized service startup options, enabling warm restoration and reducing the migration downtime. In addition, we develop a method to make FastFreeze robust against failures that commonly happen during the migration and even during the normal operation of a containerized service. Experimental results under real-world settings show that the migration downtime of a containerized service can be reduced by 30X compared to the situation where the original FastFreeze was deployed for the migration. Moreover, we demonstrate that FastMig and warm restoration method together can significantly mitigate the container startup overhead. Importantly, these improvements are achieved without any significant performance reduction and only incurs a small resource usage overhead, compared to the bare (\ie non-FastFreeze) containerized services.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
On Non-Interactive Simulation of Distributed Sources with Finite Alphabets
Authors:
Hojat Allah Salehi,
Farhad Shirani
Abstract:
This work presents a Fourier analysis framework for the non-interactive source simulation (NISS) problem. Two distributed agents observe a pair of sequences $X^d$ and $Y^d$ drawn according to a joint distribution $P_{X^dY^d}$. The agents aim to generate outputs $U=f_d(X^d)$ and $V=g_d(Y^d)$ with a joint distribution sufficiently close in total variation to a target distribution $Q_{UV}$. Existing…
▽ More
This work presents a Fourier analysis framework for the non-interactive source simulation (NISS) problem. Two distributed agents observe a pair of sequences $X^d$ and $Y^d$ drawn according to a joint distribution $P_{X^dY^d}$. The agents aim to generate outputs $U=f_d(X^d)$ and $V=g_d(Y^d)$ with a joint distribution sufficiently close in total variation to a target distribution $Q_{UV}$. Existing works have shown that the NISS problem with finite-alphabet outputs is decidable. For the binary-output NISS, an upper-bound to the input complexity was derived which is $O(\exp\operatorname{poly}(\frac{1}ε))$. In this work, the input complexity and algorithm design are addressed in several classes of NISS scenarios. For binary-output NISS scenarios with doubly-symmetric binary inputs, it is shown that the input complexity is $Θ(\log{\frac{1}ε})$, thus providing a super-exponential improvement in input complexity. An explicit characterization of the simulating pair of functions is provided. For general finite-input scenarios, a constructive algorithm is introduced that explicitly finds the simulating functions $(f_d(X^d),g_d(Y^d))$. The approach relies on a novel Fourier analysis framework. Various numerical simulations of NISS scenarios with IID inputs are provided. Furthermore, to illustrate the general applicability of the Fourier framework, several examples with non-IID inputs, including entanglement-assisted NISS and NISS with Markovian inputs are provided.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Perturbed $f(R)$ gravity coupled with neutrinos: exploring cosmological implications
Authors:
Muhammad Yarahmadi,
Amin Salehi,
Kazuharu Bamba
Abstract:
We conduct a thorough examination of cosmological parameters within the context of $f(R)$ gravity coupled with neutrinos, leveraging a diverse array of observational datasets, including Cosmic Microwave Background (CMB), Cosmic Chronometers (CC), Baryon Acoustic Oscillations (BAO), and Pantheon supernova data. Our analysis unveils compelling constraints on pivotal parameters such as the sum of neu…
▽ More
We conduct a thorough examination of cosmological parameters within the context of $f(R)$ gravity coupled with neutrinos, leveraging a diverse array of observational datasets, including Cosmic Microwave Background (CMB), Cosmic Chronometers (CC), Baryon Acoustic Oscillations (BAO), and Pantheon supernova data. Our analysis unveils compelling constraints on pivotal parameters such as the sum of neutrino masses ($\sum m_ν$), the interaction strength parameter ($Γ$), sound speed ($c_s$), Jean's wavenumbers ($k_J$), redshift of non-relativistic matter ($z_{\rm nr}$), and the redshift of the Deceleration-Acceleration phase transition ($z_{\rm DA}$). The incorporation of neutrinos within the $f(R)$ gravity framework emerges as a key factor significantly influencing cosmic evolution, intricately sha** the formation of large-scale structures and the dynamics of cosmic expansion. Additionally, a detailed analysis of bulk flow direction and amplitude across various redshifts provides valuable insights into the nature of large-scale structures. A notable aspect of our model is the nuanced integration of $f(R)$ gravity theory with neutrinos, representing a distinctive approach to unraveling cosmological phenomena. This framework, unlike previous models, explicitly considers the impact of neutrinos on gravitational interactions, the formation of large-scale structures, and the overarching dynamics of cosmic expansion within the $f(R)$ gravity paradigm. Furthermore, our study addresses the Hubble tension problem by comparing $H_0$ measurements within our model, offering a potential avenue for reconciling discrepancies. Our findings not only align with existing research but also contribute novel perspectives to our understanding of dark energy, gravitational interactions, and the intricate challenges posed by the Hubble tension.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Quantum Advantage in Non-Interactive Source Simulation
Authors:
Hojat Allah Salehi,
Farhad Shirani,
S. Sandeep Pradhan
Abstract:
This work considers the non-interactive source simulation problem (NISS). In the standard NISS scenario, a pair of distributed agents, Alice and Bob, observe a distributed binary memoryless source $(X^d,Y^d)$ generated based on joint distribution $P_{X,Y}$. The agents wish to produce a pair of discrete random variables $(U_d,V_d)$ with joint distribution $P_{U_d,V_d}$, such that $P_{U_d,V_d}$ conv…
▽ More
This work considers the non-interactive source simulation problem (NISS). In the standard NISS scenario, a pair of distributed agents, Alice and Bob, observe a distributed binary memoryless source $(X^d,Y^d)$ generated based on joint distribution $P_{X,Y}$. The agents wish to produce a pair of discrete random variables $(U_d,V_d)$ with joint distribution $P_{U_d,V_d}$, such that $P_{U_d,V_d}$ converges in total variation distance to a target distribution $Q_{U,V}$. Two variations of the standard NISS scenario are considered. In the first variation, in addition to $(X^d,Y^d)$ the agents have access to a shared Bell state. The agents each measure their respective state, using a measurement of their choice, and use its classical output along with $(X^d,Y^d)$ to simulate the target distribution. This scenario is called the entanglement-assisted NISS (EA-NISS). In the second variation, the agents have access to a classical common random bit $Z$, in addition to $(X^d,Y^d)$. This scenario is called the classical common randomness NISS (CR-NISS). It is shown that for binary-output NISS scenarios, the set of feasible distributions for EA-NISS and CR-NISS are equal with each other. Hence, there is not quantum advantage in these EA-NISS scenarios. For non-binary output NISS scenarios, it is shown through an example that there are distributions that are feasible in EA-NISS but not in CR-NISS. This shows that there is a quantum advantage in non-binary output EA-NISS.
△ Less
Submitted 2 May, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Generalized Rastall Gravity coupled with neutrinos could solve the Hubble Tension
Authors:
Muhammad Yarahmadi,
Amin Salehi
Abstract:
The Hubble tension arises when comparing two different methods of determining $H_{0}$: one based on local measurements within our cosmic vicinity and another derived from observations of the early universe, specifically the cosmic microwave background (CMB). In this article, we investigated the Hubble tension by coupling neutrinos to Rastall gravity. We estimate the $H_{0}$ in both Ealy and the Lo…
▽ More
The Hubble tension arises when comparing two different methods of determining $H_{0}$: one based on local measurements within our cosmic vicinity and another derived from observations of the early universe, specifically the cosmic microwave background (CMB). In this article, we investigated the Hubble tension by coupling neutrinos to Rastall gravity. We estimate the $H_{0}$ in both Ealy and the Local universe. The data were use in this paper are CMB (plikTTTEEE+lowl+lowE), and Lensing. BAO, CC, and Pantheon + Analysis. In the Early universe for (CMB + Lensing) the $H_{0}$ value is $69.2 \pm 1.52$ and for the Local universe (CC + BAO + Pantheon + Analysis) is $70.14 \pm 0.98$. There is a 0.54$σ$ deviation in comparing the $H_{0}$ value in the Early and the Local universe. As a result, it can be concluded that the Hubble tension may be resolved.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Resource Allocation of Industry 4.0 Micro-Service Applications across Serverless Fog Federation
Authors:
Razin Farhan Hussain,
Mohsen Amini Salehi
Abstract:
The Industry 4.0 revolution has been made possible via AI-based applications (e.g., for automation and maintenance) deployed on the serverless edge (aka fog) computing platforms at the industrial sites -- where the data is generated. Nevertheless, fulfilling the fault-intolerant and real-time constraints of Industry 4.0 applications on resource-limited fog systems in remote industrial sites (e.g.,…
▽ More
The Industry 4.0 revolution has been made possible via AI-based applications (e.g., for automation and maintenance) deployed on the serverless edge (aka fog) computing platforms at the industrial sites -- where the data is generated. Nevertheless, fulfilling the fault-intolerant and real-time constraints of Industry 4.0 applications on resource-limited fog systems in remote industrial sites (e.g., offshore oil fields) that are uncertain, disaster-prone, and have no cloud access is challenging. It is this challenge that our research aims at addressing. We consider the inelastic nature of the fog systems, software architecture of the industrial applications (micro-service-based versus monolithic), and scarcity of human experts in remote sites. To enable cloud-like elasticity, our approach is to dynamically and seamlessly (i.e., without human intervention) federate nearby fog systems. Then, we develop serverless resource allocation solutions that are cognizant of the applications' software architecture, their latency requirements, and distributed nature of the underlying infrastructure. We propose methods to seamlessly and optimally partition micro-service-based application across the federated fog. Our experimental evaluation express that not only the elasticity is overcome in a serverless manner, but also our developed application partitioning method can serve around 20% more tasks on-time than the existing methods in the literature.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
Anisotropic Signatures: Neutrinos -- Dark Energy Interaction and Its Effect on the Transition from Radiation to Matter, and Dark Energy Dominated Phases
Authors:
Muhammad Yarahmadi,
Amin Salehi
Abstract:
This paper explains the significance of neutrino mass in the cosmic progression from the radiation-dominated phase to matter and subsequently to the dark energy-dominated era. We have put a constraint on the total mass of neutrinos by coupling them with quintessence. For the combination of full data(Pantheon+CMB+BAO+CC), we find $ \sum m_ν<0.101$eV \ \ (95$\% $CL.) and for the relativistic to non-…
▽ More
This paper explains the significance of neutrino mass in the cosmic progression from the radiation-dominated phase to matter and subsequently to the dark energy-dominated era. We have put a constraint on the total mass of neutrinos by coupling them with quintessence. For the combination of full data(Pantheon+CMB+BAO+CC), we find $ \sum m_ν<0.101$eV \ \ (95$\% $CL.) and for the relativistic to non-relativistic phase transition redshif ${z_{\rm nr}} = 180$ which is in the matter-dominated era. Our findings confirm that when neutrinos become non-relativistic, the universe transitions from a radiation-dominated era to a matter-dominated era. Coupled neutrinos with quintessence (CQ) have also a significant impact on transitions from a matter-dominated era to a dark energy era. We have shown this effect by investigating the impact of neutrino mass on the bulk flow direction and amplitude of bulk velocity. Moreover, we have discussed the impact of this coupling on the CMB power spectrum to show the anisotropy in the universe. Finally, we have established a link between the quintessence field coupled with neutrinos and the bulk flow, which allowed us to demonstrate that the mass of neutrinos could be the cause of anisotropy in the universe.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
HEET: A Heterogeneity Measure to Quantify the Difference across Distributed Computing Systems
Authors:
Ali Mokhtari,
Saeid Ghafouri,
Pooyan Jamshidi,
Mohsen Amini Salehi
Abstract:
Although system heterogeneity has been extensively studied in the past, there is yet to be a study on measuring the impact of heterogeneity on system performance. For this purpose, we propose a heterogeneity measure that can characterize the impact of the heterogeneity of a system on its performance behavior in terms of throughput or makespan. We develop a mathematical model to characterize a hete…
▽ More
Although system heterogeneity has been extensively studied in the past, there is yet to be a study on measuring the impact of heterogeneity on system performance. For this purpose, we propose a heterogeneity measure that can characterize the impact of the heterogeneity of a system on its performance behavior in terms of throughput or makespan. We develop a mathematical model to characterize a heterogeneous system in terms of its task and machine heterogeneity dimensions and then reduce it to a single value, called Homogeneous Equivalent Execution Time (HEET), which represents the execution time behavior of the entire system. We used AWS EC2 instances to implement a real-world machine learning inference system. Performance evaluation of the HEET score across different heterogeneous system configurations demonstrates that HEET can accurately characterize the performance behavior of these systems. In particular, the results show that our proposed method is capable of predicting the true makespan of heterogeneous systems without online evaluations with an average precision of 84%. This heterogeneity measure is instrumental for solution architects to configure their systems proactively to be sufficiently heterogeneous to meet their desired performance objectives.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Accelerating universe in Kaniadakis cosmology without need of dark energy
Authors:
Amin Salehi
Abstract:
Taking into consideration of Kaniadakis entropy associated with the apparent horizon of Friedmann-Robertson-Walker (FRW) Universe and using the gravity-thermodynamics conjecture, a new cosmological scenarios emerges based on corrected Friedmann equations, which contains a correction term $ α\left(H^2+\frac{k}{a^2}\right)^{-1}$ where $α\equiv\frac{K^2 π^2}{2 G^2}$ and $K$ is Kaniadakis parameter. W…
▽ More
Taking into consideration of Kaniadakis entropy associated with the apparent horizon of Friedmann-Robertson-Walker (FRW) Universe and using the gravity-thermodynamics conjecture, a new cosmological scenarios emerges based on corrected Friedmann equations, which contains a correction term $ α\left(H^2+\frac{k}{a^2}\right)^{-1}$ where $α\equiv\frac{K^2 π^2}{2 G^2}$ and $K$ is Kaniadakis parameter. We show that it is possible to reconstruct the parameters of the model, in terms of cosmographic parameters$\{ q, j, s\}$ analytically. For the flat universe, the parameters can be reconstructed in terms of only two cosmographic parameters $\{q, j\}$. The advantage of this analytical reconstruction is that it provides the possibility to test observational measurements on Kaniadakis cosmology using directly measurable cosmographic parameters.
As an interesting result is that without any assumption about the value of $Λ$, we found that the set $\{q_{0}=-0.708, j_{0}=1.137\}$ automatically gives $Λ\simeq0$ and $\{Ω_{m0}\simeq0.325,Ω_{\alpha0}=0.671\}$. This result is in excellent agrement with pervious observational studies. Reconstructing the evolution of deceleration parameter against redshift $z$ for these values, shows that the correction term could plays the role of dark energy without any dark energy component or cosmological constant $Λ$. Finally, we formulate the deviation parameter in terms of $\{q,j\}$ which reflects the deviation of the model from $ΛCDM$ model. We Show that the deviation factor is very sensitive to the jerk parameter $j$, while the $Ω_{m0}$ is sensitive to deceleration parameter $q_{0}$. Hence, the set $\{j,q\}$ can be regarded as useful parameters to test the theoretical and observational studies in Kaniadakis cosmology.
△ Less
Submitted 30 August, 2023;
originally announced September 2023.
-
Test of Barrow entropy using a model independent approach
Authors:
Amin Salehi
Abstract:
Taking into consideration of a fractal structure for the black hole horizon, Barrow argued that the area law of entropy get modified due to quantum-gravitational effects. Accordingly, the corrected entropy takes the form $S\sim A^{1+\fracΔ{2}}$, where $0\leqΔ\leq1,$ indicates the amount of the quantum-gravitational deformation effects. By considering the modified Barrow entropy associated with the…
▽ More
Taking into consideration of a fractal structure for the black hole horizon, Barrow argued that the area law of entropy get modified due to quantum-gravitational effects. Accordingly, the corrected entropy takes the form $S\sim A^{1+\fracΔ{2}}$, where $0\leqΔ\leq1,$ indicates the amount of the quantum-gravitational deformation effects. By considering the modified Barrow entropy associated with the apparent horizon, the Friedmann equations get modified as well. We show that considering a universe filled with the matter and cosmological constant $Λ$, it is possible to determine the amount of deviation from standard cosmology by reconstructing the parameter $δ$ in terms of curvature parameters $\{q,Q,Ω_{k}\}$ as $Δ=\frac{(Q-1-Ω_k)(1+Ω_k)}{(1+Ω_k+q)^{2}}$. Here, $q$ is the deceleration parameter and $Q$ is the third derivative of scale factor . This relation provides some advantages. The first is that it indicates that there is profound connection between quantum-gravitational deformation effects and curvature effects, for $Ω_k\simeq0$ the pair $\{q,Q\}$ can be regarded as deviation curvature factors which reflect the amount of deviation of the model from the standard model. The second interesting feature is that, since this pair are observational parameters which can be directly measured in a model independent approach, they can be regarded as powerful tools to enable us to put constraint on parameter $Δ$ and test the Barrow entropy model. Our analysis predicts the value for $Q_{0}$ which is slightly deviates from 1 as $(Q_{0}-1)<0.001$. This can be a relativity well target and criterion for theoretical and observational measurements of parameter $Q_{0}$. Hence we can hope and wait the improvement of the high redshift data in the future to support it.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
UMS: Live Migration of Containerized Services across Autonomous Computing Systems
Authors:
Thanawat Chanikaphon,
Mohsen Amini Salehi
Abstract:
Containerized services deployed within various computing systems, such as edge and cloud, desire live migration support to enable user mobility, elasticity, and load balancing. To enable such a ubiquitous and efficient service migration, a live migration solution needs to handle circumstances where users have various authority levels (full control, limited control, or no control) over the underlyi…
▽ More
Containerized services deployed within various computing systems, such as edge and cloud, desire live migration support to enable user mobility, elasticity, and load balancing. To enable such a ubiquitous and efficient service migration, a live migration solution needs to handle circumstances where users have various authority levels (full control, limited control, or no control) over the underlying computing systems. Supporting the live migration at these levels serves as the cornerstone of interoperability, and can unlock several use cases across various forms of distributed systems. As such, in this study, we develop a ubiquitous migration solution (called UMS) that, for a given containerized service, can automatically identify the feasible migration approach, and then seamlessly perform the migration across autonomous computing systems. UMS does not interfere with the way the orchestrator handles containers and can coordinate the migration without the orchestrator involvement. Moreover, UMS is orchestrator-agnostic, i.e., it can be plugged into any underlying orchestrator platform. UMS is equipped with novel methods that can coordinate and perform the live migration at the orchestrator, container, and service levels. Experimental results show that for single-process containers, the service-level approach, and for multi-process containers with small (< 128 MiB) memory footprint, the container-level migration approach lead to the lowest migration overhead and service downtime. To demonstrate the potential of UMS in realizing interoperability and multi-cloud scenarios, we examined it to perform live service migration across heterogeneous orchestrators, and between Microsoft Azure and Google Cloud
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Integrating LLMs and Decision Transformers for Language Grounded Generative Quality-Diversity
Authors:
Achkan Salehi,
Stephane Doncieux
Abstract:
Quality-Diversity is a branch of stochastic optimization that is often applied to problems from the Reinforcement Learning and control domains in order to construct repertoires of well-performing policies/skills that exhibit diversity with respect to a behavior space. Such archives are usually composed of a finite number of reactive agents which are each associated to a unique behavior descriptor,…
▽ More
Quality-Diversity is a branch of stochastic optimization that is often applied to problems from the Reinforcement Learning and control domains in order to construct repertoires of well-performing policies/skills that exhibit diversity with respect to a behavior space. Such archives are usually composed of a finite number of reactive agents which are each associated to a unique behavior descriptor, and instantiating behavior descriptors outside of that coarsely discretized space is not straight-forward. While a few recent works suggest solutions to that issue, the trajectory that is generated is not easily customizable beyond the specification of a target behavior descriptor. We propose to jointly solve those problems in environments where semantic information about static scene elements is available by leveraging a Large Language Model to augment the repertoire with natural language descriptions of trajectories, and training a policy conditioned on those descriptions. Thus, our method allows a user to not only specify an arbitrary target behavior descriptor, but also provide the model with a high-level textual prompt to shape the generated trajectory. We also propose an LLM-based approach to evaluating the performance of such generative agents. Furthermore, we develop a benchmark based on simulated robot navigation in a 2d maze that we use for experimental validation.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Confidential Computing across Edge-to-Cloud for Machine Learning: A Survey Study
Authors:
SM Zobaed,
Mohsen Amini Salehi
Abstract:
Confidential computing has gained prominence due to the escalating volume of data-driven applications (e.g., machine learning and big data) and the acute desire for secure processing of sensitive data, particularly, across distributed environments, such as edge-to-cloud continuum. Provided that the works accomplished in this emerging area are scattered across various research fields, this paper ai…
▽ More
Confidential computing has gained prominence due to the escalating volume of data-driven applications (e.g., machine learning and big data) and the acute desire for secure processing of sensitive data, particularly, across distributed environments, such as edge-to-cloud continuum. Provided that the works accomplished in this emerging area are scattered across various research fields, this paper aims at surveying the fundamental concepts, and cutting-edge software and hardware solutions developed for confidential computing using trusted execution environments, homomorphic encryption, and secure enclaves. We underscore the significance of building trust in both hardware and software levels and delve into their applications particularly for machine learning (ML) applications. While substantial progress has been made, there are some barely-explored areas that need extra attention from the researchers and practitioners in the community to improve confidentiality aspects, develop more robust attestation mechanisms, and to address vulnerabilities of the existing trusted execution environments. Providing a comprehensive taxonomy of the confidential computing landscape, this survey enables researchers to advance this field to ultimately ensure the secure processing of users' sensitive data across a multitude of applications and computing tiers.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Computational Pathology: A Survey Review and The Way Forward
Authors:
Mahdi S. Hosseini,
Babak Ehteshami Bejnordi,
Vincent Quoc-Huy Trinh,
Danial Hasan,
Xingwen Li,
Taehyo Kim,
Haochen Zhang,
Theodore Wu,
Kajanan Chinniah,
Sina Maghsoudlou,
Ryan Zhang,
Stephen Yang,
Jiadai Zhu,
Lyndon Chan,
Samir Khaki,
Andrei Buin,
Fatemeh Chaji,
Ala Salehi,
Bich Ngoc Nguyen,
Dimitris Samaras,
Konstantinos N. Plataniotis
Abstract:
Computational Pathology CPath is an interdisciplinary science that augments developments of computational approaches to analyze and model medical histopathology images. The main objective for CPath is to develop infrastructure and workflows of digital diagnostics as an assistive CAD system for clinical pathology, facilitating transformational changes in the diagnosis and treatment of cancer that a…
▽ More
Computational Pathology CPath is an interdisciplinary science that augments developments of computational approaches to analyze and model medical histopathology images. The main objective for CPath is to develop infrastructure and workflows of digital diagnostics as an assistive CAD system for clinical pathology, facilitating transformational changes in the diagnosis and treatment of cancer that are mainly address by CPath tools. With evergrowing developments in deep learning and computer vision algorithms, and the ease of the data flow from digital pathology, currently CPath is witnessing a paradigm shift. Despite the sheer volume of engineering and scientific works being introduced for cancer image analysis, there is still a considerable gap of adopting and integrating these algorithms in clinical practice. This raises a significant question regarding the direction and trends that are undertaken in CPath. In this article we provide a comprehensive review of more than 800 papers to address the challenges faced in problem design all-the-way to the application and implementation viewpoints. We have catalogued each paper into a model-card by examining the key works and challenges faced to layout the current landscape in CPath. We hope this helps the community to locate relevant works and facilitate understanding of the field's future directions. In a nutshell, we oversee the CPath developments in cycle of stages which are required to be cohesively linked together to address the challenges associated with such multidisciplinary science. We overview this cycle from different perspectives of data-centric, model-centric, and application-centric problems. We finally sketch remaining challenges and provide directions for future technical developments and clinical integration of CPath (https://github.com/AtlasAnalyticsLab/CPath_Survey).
△ Less
Submitted 27 January, 2024; v1 submitted 11 April, 2023;
originally announced April 2023.
-
E2C: A Visual Simulator to Reinforce Education of Heterogeneous Computing Systems
Authors:
Ali Mokhtari,
Drake Rawls,
Tony Huynh,
Jeremiah Green,
Mohsen Amini Salehi
Abstract:
With the increasing popularity of accelerator technologies (e.g., GPUs and TPUs) and the emergence of domain-specific computing via ASICs and FPGA, the matter of heterogeneity and understanding its ramifications on the performance has become more critical than ever before. However, it is challenging to effectively educate students about the potential impacts of heterogeneity on the performance of…
▽ More
With the increasing popularity of accelerator technologies (e.g., GPUs and TPUs) and the emergence of domain-specific computing via ASICs and FPGA, the matter of heterogeneity and understanding its ramifications on the performance has become more critical than ever before. However, it is challenging to effectively educate students about the potential impacts of heterogeneity on the performance of distributed systems; and on the logic of resource allocation methods to efficiently utilize the resources. Making use of the real infrastructure for benchmarking the performance of heterogeneous machines, for different applications, with respect to different objectives, and under various workload intensities is cost- and time-prohibitive. To reinforce the quality of learning about various dimensions of heterogeneity, and to decrease the widening gap in education, we develop an open-source simulation tool, called E2C, that can help students researchers to study any type of heterogeneous (or homogeneous) computing system and measure its performance under various configurations. E2C is equipped with an intuitive graphical user interface (GUI) that enables its users to easily examine system-level solutions (scheduling, load balancing, scalability, etc.) in a controlled environment within a short time. E2C is a discrete event simulator that offers the following features: (i) simulating a heterogeneous computing system; (ii) implementing a newly developed scheduling method and plugging it into the system, (iii) measuring energy consumption and other output-related metrics; and (iv) powerful visual aspects to ease the learning curve for students. We used E2C as an assignment in the Distributed and Cloud Computing course. Our anonymous survey study indicates that students rated E2C with the score of 8.7 out of 10 for its usefulness in understanding the concepts of scheduling in heterogeneous computing.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Quantum Computation via Multiport Quantum Fourier Optical Processors
Authors:
Mohammad Rezai,
Jawad A. Salehi
Abstract:
The light's image is the primary source of information carrier in nature. Indeed, a single photon's image possesses a vast information capacity that can be harnessed for quantum information processing. Our scheme for implementing quantum information processing via universal multiport processors employs a class of quantum Fourier optical systems composed of spatial phase modulators and 4f-processor…
▽ More
The light's image is the primary source of information carrier in nature. Indeed, a single photon's image possesses a vast information capacity that can be harnessed for quantum information processing. Our scheme for implementing quantum information processing via universal multiport processors employs a class of quantum Fourier optical systems composed of spatial phase modulators and 4f-processors with phase-only pupils having a characteristic periodicity that reduces the number of optical resources quadratically as compared to other conventional path encoding techniques. In particular, this paper employs quantum Fourier optics to implement some key quantum logical gates that can be instrumental in optical quantum computations. For instance, we demonstrate the principle by implementing the single-qubit Hadamard and the two-qubit controlled-NOT gates via simulation and optimization techniques. Due to various advantages of the proposed scheme, including the large information capacity of the photon wavefront, a quadratically reduced number of optical resources compared with other conventional path encoding techniques, and dynamic programmability, the proposed scheme has the potential to be an essential contribution to linear optical quantum computing and optical quantum signal processing.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
Data-efficient, Explainable and Safe Box Manipulation: Illustrating the Advantages of Physical Priors in Model-Predictive Control
Authors:
Achkan Salehi,
Stephane Doncieux
Abstract:
Model-based RL/control have gained significant traction in robotics. Yet, these approaches often remain data-inefficient and lack the explainability of hand-engineered solutions. This makes them difficult to debug/integrate in safety-critical settings. However, in many systems, prior knowledge of environment kinematics/dynamics is available. Incorporating such priors can help address the aforement…
▽ More
Model-based RL/control have gained significant traction in robotics. Yet, these approaches often remain data-inefficient and lack the explainability of hand-engineered solutions. This makes them difficult to debug/integrate in safety-critical settings. However, in many systems, prior knowledge of environment kinematics/dynamics is available. Incorporating such priors can help address the aforementioned problems by reducing problem complexity and the need for exploration, while also facilitating the expression of the decisions taken by the agent in terms of physically meaningful entities. Our aim with this paper is to illustrate and support this point of view via a case-study. We model a payload manipulation problem based on a real robotic system, and show that leveraging prior knowledge about the dynamics of the environment in an MPC framework can lead to improvements in explainability, safety and data-efficiency, leading to satisfying generalization properties with less data.
△ Less
Submitted 28 March, 2024; v1 submitted 2 March, 2023;
originally announced March 2023.
-
Fabricating a dielectrophoretic microfluidic device using 3D-printed moulds and silver conductive paint
Authors:
Shayan Valijam,
Daniel P. G. Nilsson,
Dmitry Malyshev,
Rasmus Öberg,
Alireza Salehi,
Magnus Andersson
Abstract:
Dielectrophoresis is an electric field-based technique for moving neutral particles through a fluid. When used for particle separation, dielectrophoresis has many advantages compared to other methods, providing label-free operation with greater control of the separation forces. In this paper, we design, build, and test a low-voltage dielectrophoretic device using a 3D printing approach. This lab-o…
▽ More
Dielectrophoresis is an electric field-based technique for moving neutral particles through a fluid. When used for particle separation, dielectrophoresis has many advantages compared to other methods, providing label-free operation with greater control of the separation forces. In this paper, we design, build, and test a low-voltage dielectrophoretic device using a 3D printing approach. This lab-on-a-chip device fits on a microscope glass slide and incorporates microfluidic channels for particle separation. First, we use multiphysics simulations to evaluate the separation efficiency of the prospective device and guide the design process. Second, we fabricate the device in PDMS (polydimethylsiloxane) by using 3D-printed moulds that contain patterns of the channels and electrodes. The imprint of the electrodes is then filled with silver conductive paint, making a 9 pole comb electrode. Lastly, we evaluate the separation efficiency of our device by introducing a mixture of 3 $μ$m and 10 $μ$m polystyrene particles and tracking their progression. Our device is able to efficiently separate these particles when the electrodes are energized with $\pm$12 V at 75 kHz. Overall, our method allows the fabrication of cheap and effective dielectrophoretic microfluidic devices using commercial off-the-shelf equipment.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
AGNI: In-Situ, Iso-Latency Stochastic-to-Binary Number Conversion for In-DRAM Deep Learning
Authors:
Supreeth Mysore Shivanandamurthy,
Sairam Sri Vatsavai,
Ishan Thakkar,
Sayed Ahmad Salehi
Abstract:
Recent years have seen a rapid increase in research activity in the field of DRAM-based Processing-In-Memory (PIM) accelerators, where the analog computing capability of DRAM is employed by minimally changing the inherent structure of DRAM peripherals to accelerate various data-centric applications. Several DRAM-based PIM accelerators for Convolutional Neural Networks (CNNs) have also been reporte…
▽ More
Recent years have seen a rapid increase in research activity in the field of DRAM-based Processing-In-Memory (PIM) accelerators, where the analog computing capability of DRAM is employed by minimally changing the inherent structure of DRAM peripherals to accelerate various data-centric applications. Several DRAM-based PIM accelerators for Convolutional Neural Networks (CNNs) have also been reported. Among these, the accelerators leveraging in-DRAM stochastic arithmetic have shown manifold improvements in processing latency and throughput, due to the ability of stochastic arithmetic to convert multiplications into simple bit-wise logical AND operations. However,the use of in-DRAM stochastic arithmetic for CNN acceleration requires frequent stochastic to binary number conversions. For that, prior works employ full adder-based or serial counter based in-DRAM circuits. These circuits consume large area and incur long latency. Their in-DRAM implementations also require heavy modifications in DRAM peripherals, which significantly diminishes the benefits of using stochastic arithmetic in these accelerators. To address these shortcomings, this paper presents a new substrate for in-DRAM stochastic-to-binary number conversion called AGNI. AGNI makes minor modifications in DRAM peripherals using pass transistors, capacitors, encoders, and charge pumps, and re-purposes the sense amplifiers as voltage comparators, to enable in-situ binary conversion of input statistic operands of different sizes with iso latency.
△ Less
Submitted 11 February, 2023;
originally announced February 2023.
-
SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNs
Authors:
Sairam Sri Vatsavai,
Venkata Sai Praneeth Karempudi,
Ishan Thakkar,
Ahmad Salehi,
Todd Hastings
Abstract:
The acceleration of a CNN inference task uses convolution operations that are typically transformed into vector-dot-product (VDP) operations. Several photonic microring resonators (MRRs) based hardware architectures have been proposed to accelerate integer-quantized CNNs with remarkably higher throughput and energy efficiency compared to their electronic counterparts. However, the existing photoni…
▽ More
The acceleration of a CNN inference task uses convolution operations that are typically transformed into vector-dot-product (VDP) operations. Several photonic microring resonators (MRRs) based hardware architectures have been proposed to accelerate integer-quantized CNNs with remarkably higher throughput and energy efficiency compared to their electronic counterparts. However, the existing photonic MRR-based analog accelerators exhibit a very strong trade-off between the achievable input/weight precision and VDP operation size, which severely restricts their achievable VDP operation size for the quantized input/weight precision of 4 bits and higher. The restricted VDP operation size ultimately suppresses computing throughput to severely diminish the achievable performance benefits. To address this shortcoming, we for the first time present a merger of stochastic computing and MRR-based CNN accelerators. To leverage the innate precision flexibility of stochastic computing, we invent an MRR-based optical stochastic multiplier (OSM). We employ multiple OSMs in a cascaded manner using dense wavelength division multiplexing, to forge a novel Stochastic Computing based Optical Neural Network Accelerator (SCONNA). SCONNA achieves significantly high throughput and energy efficiency for accelerating inferences of high-precision quantized CNNs. Our evaluation for the inference of four modern CNNs at 8-bit input/weight precision indicates that SCONNA provides improvements of up to 66.5x, 90x, and 91x in frames-per-second (FPS), FPS/W and FPS/W/mm2, respectively, on average over two photonic MRR-based analog CNN accelerators from prior work, with Top-1 accuracy drop of only up to 0.4% for large CNNs and up to 1.5% for small CNNs. We developed a transaction-level, event-driven python-based simulator for the evaluation of SCONNA and other accelerators (https://github.com/uky-UCAT/SC_ONN_SIM.git).
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
Revisiting the history of the evolution of the universe from radiation dominated to dark energy dominated
Authors:
Muhammad Yarahmadi,
Amin Salehi
Abstract:
Most of our knowledge of the universe has been obtained from the anisotropy spectrum of the cosmic microwave background and observations of large-scale structures. During the history of the Universe, neutrinos from the early Universe evolve from a relativistic phase at very early times to a massive-particle behavior at later times. The mass of neutrinos affects the history of the expansion of the…
▽ More
Most of our knowledge of the universe has been obtained from the anisotropy spectrum of the cosmic microwave background and observations of large-scale structures. During the history of the Universe, neutrinos from the early Universe evolve from a relativistic phase at very early times to a massive-particle behavior at later times. The mass of neutrinos affects the history of the expansion of the universe and the growth of the disturbances of the various components of the cosmic microwave background, therefore the anisotropy spectrum of the cosmic radiation and the observations of the large-scale structures. In this article, by using the coupling of neutrinos with dark energy, we investigate the cosmic evolution from the era of matter dominated to the era of dark energy dominated and show that neutrinos can play an important role in the evolution from radiation dominated to matter dominated and the evolution from Matter dominated to dark energy dominated. Also, we investigate the effect of non-relativistic neutrino on bulk flow and show that the direction of bulk flow has a little difference in scales smaller than 0.1, and the more we consider the scales higher than the local universe, the more difference is observed in the direction of bulk flow.
△ Less
Submitted 29 October, 2023; v1 submitted 12 February, 2023;
originally announced February 2023.
-
Federated Fog Computing for Remote Industry 4.0 Applications
Authors:
Razin Farhan Hussain,
Mohsen Amini Salehi
Abstract:
Industry 4.0 operates based on IoT devices, sensors, and actuators, transforming the use of computing resources and software solutions in diverse sectors. Various Industry 4.0 latency-sensitive applications function based on machine learning to process sensor data for automation and other industrial activities. Sending sensor data to cloud systems is time consuming and detrimental to the latency c…
▽ More
Industry 4.0 operates based on IoT devices, sensors, and actuators, transforming the use of computing resources and software solutions in diverse sectors. Various Industry 4.0 latency-sensitive applications function based on machine learning to process sensor data for automation and other industrial activities. Sending sensor data to cloud systems is time consuming and detrimental to the latency constraints of the applications, thus, fog computing is often deployed. Executing these applications across heterogeneous fog systems demonstrates stochastic execution time behavior that affects the task completion time. We investigate and model various Industry 4.0 ML-based applications' stochastic executions and analyze them. Industries like oil and gas are prone to disasters requiring coordination of various latency-sensitive activities. Hence, fog computing resources can get oversubscribed due to the surge in the computing demands during a disaster. We propose federating nearby fog computing systems and forming a fog federation to make remote Industry 4.0 sites resilient against the surge in computing demands. We propose a statistical resource allocation method across fog federation for latency-sensitive tasks. Many of the modern Industry 4.0 applications operate based on a workflow of micro-services that are used alone within an industrial site. As such, industry 4.0 solutions need to be aware of applications' architecture, particularly monolithic vs. micro-service. Therefore, we propose a probability-based resource allocation method that can partition micro-service workflows across fog federation to meet their latency constraints. Another concern in Industry 4.0 is the data privacy of the federated fog. As such, we propose a solution based on federated learning to train industrial ML applications across federated fog systems without compromising the data confidentiality.
△ Less
Submitted 1 January, 2023;
originally announced January 2023.
-
Load Balancer Tuning: Comparative Analysis of HAProxy Load Balancing Methods
Authors:
Connor Rawls,
Mohsen Amini Salehi
Abstract:
Load balancing is prevalent in practical application (e.g., web) deployments seen today. One such load balancer, HAProxy, remains relevant as an open-source, easy-to-use system. In the context of web systems, the load balancer tier possesses significant influence over system performance and the incurred cost, which is decisive for cloud-based deployments. Therefore, it is imperative to properly tu…
▽ More
Load balancing is prevalent in practical application (e.g., web) deployments seen today. One such load balancer, HAProxy, remains relevant as an open-source, easy-to-use system. In the context of web systems, the load balancer tier possesses significant influence over system performance and the incurred cost, which is decisive for cloud-based deployments. Therefore, it is imperative to properly tune the load balancer configuration and get the most performance out of the existing resources. In this technical report, we first introduce the HAProxy architecture and its load balancing methods. Then, we discuss fine-tuning parameters within this load balancer and examine their performances in face of various workload intensities. Our evaluation encompasses various types of web requests and homogeneous and heterogeneous back-ends. Lastly, based on the findings of this study, we present a set of best practices to optimally configure HAProxy.
△ Less
Submitted 29 December, 2022;
originally announced December 2022.
-
E2C: A Visual Simulator for Heterogeneous Computing Systems
Authors:
Ali Mokhtari,
Mohsen Amini Salehi
Abstract:
Heterogeneity has been an indispensable aspect of distributed computing throughout the history of these systems. In particular, with the increasing prevalence of accelerator technologies (e.g., GPUs and TPUs) and the emergence of domain-specific computing via ASICs and FPGA, the matter of heterogeneity and harnessing it has become a more critical challenge than ever before. Harnessing system heter…
▽ More
Heterogeneity has been an indispensable aspect of distributed computing throughout the history of these systems. In particular, with the increasing prevalence of accelerator technologies (e.g., GPUs and TPUs) and the emergence of domain-specific computing via ASICs and FPGA, the matter of heterogeneity and harnessing it has become a more critical challenge than ever before. Harnessing system heterogeneity has been a longstanding challenge in distributed systems and has been investigated extensively in the past. Making use of real infrastructure (such as those offered by the public cloud providers) for benchmarking the performance of heterogeneous machines, for different applications, with respect to different objectives, and under various workload intensities is cost- and time-prohibitive. To mitigate this burden, we develop an open-source simulation tool, called E2C, that can help researchers and practitioners study any type of heterogeneous computing system and measure its performance under various system configurations. E2C has an intuitive graphical user interface (GUI) that enables its users to easily examine system-level solutions (scheduling, load balancing, scalability, etc.) in a controlled environment within a short time and at no cost. In particular, E2C offers the following features: (i) simulating a heterogeneous computing system; (ii) implementing a newly developed scheduling method and plugging it into the system, (iii) measuring energy consumption and other output-related metrics; and (iv) powerful visual aspects to ease the learning curve for students. Potential users of E2C can be undergraduate and graduate students in computer science/engineering, researchers, and practitioners.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Evaluation of the Antibacterial and Wound Healing Properties of a Burn Ointment Containing Curcumin, Honey, and Potassium Aluminium
Authors:
Mahsa Shahbandeh,
Mahsa Amin Salehi,
Maryam Soltanyzadeh,
Mehrnaz Mirzaei,
Ali Maleki,
Abdolkarim Chehregani rad,
Mohammad Javad Fatemi,
Reza Mirnejad,
Mostafa Dahmardehei
Abstract:
Burn wounds can severely trouble the health system and life quality of patients. The present study aimed to analyze the synergistic healing properties of curcumin, honey, and potassium alum substances merged in a newly-devised burn ointment on second-degree burn wounds in rats. The MIC and MBC tests on 200 clinical isolates of Pseudomonas aeruginous are compared to imipenem in vitro. Their killing…
▽ More
Burn wounds can severely trouble the health system and life quality of patients. The present study aimed to analyze the synergistic healing properties of curcumin, honey, and potassium alum substances merged in a newly-devised burn ointment on second-degree burn wounds in rats. The MIC and MBC tests on 200 clinical isolates of Pseudomonas aeruginous are compared to imipenem in vitro. Their killing time and cytotoxicity are also studied using a standard isolate of P. aeruginous, fibroblast stem cells (FSC) and mouse embryonic fibroblasts (MEF). Furthermore, histopathological and histomorphological assessments are conducted on 150 male Wistar rats whitin four experimental groups to evaluate the efficiency of the prepared burn ointment. We found a significant wound healing in both macroscopical observations and microscopical evaluations. Both curcumin and honey show strong antimicrobial effects with no cytotoxicity. Also, the histopathological results present a considerable and comparable wound re-epithelization in the a group of rats treated with both honey and curcumin after 7 days. The burn ointment containing curcumin, honey, and potassium alum show considerable efficacy in accelerating the healing of experimentally-induced burn wounds in animals. Th novel onement product is propose as a powerful alternative for the topical treatment of burn injuries.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
Edge-MultiAI: Multi-Tenancy of Latency-Sensitive Deep Learning Applications on Edge
Authors:
SM Zobaed,
Ali Mokhtari,
Jaya Prakash Champati,
Mathieu Kourouma,
Mohsen Amini Salehi
Abstract:
Smart IoT-based systems often desire continuous execution of multiple latency-sensitive Deep Learning (DL) applications. The edge servers serve as the cornerstone of such IoT-based systems, however, their resource limitations hamper the continuous execution of multiple (multi-tenant) DL applications. The challenge is that, DL applications function based on bulky "neural network (NN) models" that c…
▽ More
Smart IoT-based systems often desire continuous execution of multiple latency-sensitive Deep Learning (DL) applications. The edge servers serve as the cornerstone of such IoT-based systems, however, their resource limitations hamper the continuous execution of multiple (multi-tenant) DL applications. The challenge is that, DL applications function based on bulky "neural network (NN) models" that cannot be simultaneously maintained in the limited memory space of the edge. Accordingly, the main contribution of this research is to overcome the memory contention challenge, thereby, meeting the latency constraints of the DL applications without compromising their inference accuracy. We propose an efficient NN model management framework, called Edge-MultiAI, that ushers the NN models of the DL applications into the edge memory such that the degree of multi-tenancy and the number of warm-starts are maximized. Edge-MultiAI leverages NN model compression techniques, such as model quantization, and dynamically loads NN models for DL applications to stimulate multi-tenancy on the edge server. We also devise a model management heuristic for Edge-MultiAI, called iWS-BFE, that functions based on the Bayesian theory to predict the inference requests for multi-tenant applications, and uses it to choose the appropriate NN models for loading, hence, increasing the number of warm-start inferences. We evaluate the efficacy and robustness of Edge-MultiAI under various configurations. The results reveal that Edge-MultiAI can stimulate the degree of multi-tenancy on the edge by at least 2X and increase the number of warm-starts by around 60% without any major loss on the inference accuracy of the applications.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Adaptive Asynchronous Control Using Meta-learned Neural Ordinary Differential Equations
Authors:
Achkan Salehi,
Steffen Rühl,
Stephane Doncieux
Abstract:
Model-based Reinforcement Learning and Control have demonstrated great potential in various sequential decision making problem domains, including in robotics settings. However, real-world robotics systems often present challenges that limit the applicability of those methods. In particular, we note two problems that jointly happen in many industrial systems: 1) Irregular/asynchronous observations…
▽ More
Model-based Reinforcement Learning and Control have demonstrated great potential in various sequential decision making problem domains, including in robotics settings. However, real-world robotics systems often present challenges that limit the applicability of those methods. In particular, we note two problems that jointly happen in many industrial systems: 1) Irregular/asynchronous observations and actions and 2) Dramatic changes in environment dynamics from an episode to another (e.g. varying payload inertial properties). We propose a general framework that overcomes those difficulties by meta-learning adaptive dynamics models for continuous-time prediction and control. The proposed approach is task-agnostic and can be adapted to new tasks in a straight-forward manner. We present evaluations in two different robot simulations and on a real industrial robot.
△ Less
Submitted 23 October, 2023; v1 submitted 25 July, 2022;
originally announced July 2022.
-
Object as a Service (OaaS): Enabling Object Abstraction in Serverless Clouds
Authors:
Pawissanutt Lertpongrujikorn,
Mohsen Amini Salehi
Abstract:
Function as a Service (FaaS) paradigm is becoming widespread and is envisioned as the next generation of cloud systems that mitigate the burden for programmers and cloud solution architects. However, the FaaS abstraction only makes the cloud resource management aspects transparent but does not deal with the application data aspects. As such, developers have to undergo the burden of managing the ap…
▽ More
Function as a Service (FaaS) paradigm is becoming widespread and is envisioned as the next generation of cloud systems that mitigate the burden for programmers and cloud solution architects. However, the FaaS abstraction only makes the cloud resource management aspects transparent but does not deal with the application data aspects. As such, developers have to undergo the burden of managing the application data, often via separate cloud services (e.g., AWS S3). Similarly, the FaaS abstraction does not natively support function workflow, hence, the developers often have to work with workflow orchestration services (e.g., AWS Step Functions) to build workflows. Moreover, they have to explicitly navigate the data throughout the workflow. To overcome these problems of FaaS, we design a higher-level cloud programming abstraction that hides the complexities and mitigate the burden of develo** cloud-native application development. We borrow the notion of object from object-oriented programming and propose a new abstraction level atop the function abstraction, known as Object as a Service (OaaS). OaaS encapsulates the application data and function into the object abstraction and relieves the developers from resource and data management burdens. It also unlocks opportunities for built-in optimization features, such as software reusability, data locality, and caching. OaaS natively supports dataflow programming such that developers define a workflow of functions transparently without getting involved in data navigation, synchronization, and parallelism aspects. We implemented a prototype of the OaaS platform and evaluated it under real-world settings against state-of-the-art platforms regarding the imposed overhead, scalability, and ease of use. The results demonstrate that OaaS streamlines cloud programming and offers scalability with an insignificant overhead to the underlying cloud system.
△ Less
Submitted 5 September, 2023; v1 submitted 10 June, 2022;
originally announced June 2022.
-
FELARE: Fair Scheduling of Machine Learning Tasks on Heterogeneous Edge Systems
Authors:
Ali Mokhtari,
Md Abir Hossen,
Pooyan Jamshidi,
Mohsen Amini Salehi
Abstract:
Edge computing enables smart IoT-based systems via concurrent and continuous execution of latency-sensitive machine learning (ML) applications. These edge-based machine learning systems are often battery-powered (i.e., energy-limited). They use heterogeneous resources with diverse computing performance (e.g., CPU, GPU, and/or FPGAs) to fulfill the latency constraints of ML applications. The challe…
▽ More
Edge computing enables smart IoT-based systems via concurrent and continuous execution of latency-sensitive machine learning (ML) applications. These edge-based machine learning systems are often battery-powered (i.e., energy-limited). They use heterogeneous resources with diverse computing performance (e.g., CPU, GPU, and/or FPGAs) to fulfill the latency constraints of ML applications. The challenge is to allocate user requests for different ML applications on the Heterogeneous Edge Computing Systems (HEC) with respect to both the energy and latency constraints of these systems. To this end, we study and analyze resource allocation solutions that can increase the on-time task completion rate while considering the energy constraint. Importantly, we investigate edge-friendly (lightweight) multi-objective map** heuristics that do not become biased toward a particular application type to achieve the objectives; instead, the heuristics consider "fairness" across the concurrent ML applications in their map** decisions. Performance evaluations demonstrate that the proposed heuristic outperforms widely-used heuristics in heterogeneous systems in terms of the latency and energy objectives, particularly, at low to moderate request arrival rates. We observed 8.9% improvement in on-time task completion rate and 12.6% in energy-saving without imposing any significant overhead on the edge system.
△ Less
Submitted 20 July, 2022; v1 submitted 31 May, 2022;
originally announced June 2022.
-
UniMorph 4.0: Universal Morphology
Authors:
Khuyagbaatar Batsuren,
Omer Goldman,
Salam Khalifa,
Nizar Habash,
Witold Kieraś,
Gábor Bella,
Brian Leonard,
Garrett Nicolai,
Kyle Gorman,
Yustinus Ghanggo Ate,
Maria Ryskina,
Sabrina J. Mielke,
Elena Budianskaya,
Charbel El-Khaissi,
Tiago Pimentel,
Michael Gasser,
William Lane,
Mohit Raj,
Matt Coler,
Jaime Rafael Montoya Samame,
Delio Siticonatzi Camaiteri,
Benoît Sagot,
Esaú Zumaeta Rojas,
Didier López Francis,
Arturo Oncevay
, et al. (71 additional authors not shown)
Abstract:
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This pa…
▽ More
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements made on several fronts over the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g. missing gender and macron information. We have also amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet.
△ Less
Submitted 19 June, 2022; v1 submitted 7 May, 2022;
originally announced May 2022.
-
Towards QD-suite: develo** a set of benchmarks for Quality-Diversity algorithms
Authors:
Achkan Salehi,
Stephane Doncieux
Abstract:
While the field of Quality-Diversity (QD) has grown into a distinct branch of stochastic optimization, a few problems, in particular locomotion and navigation tasks, have become de facto standards. Are such benchmarks sufficient? Are they representative of the key challenges faced by QD algorithms? Do they provide the ability to focus on one particular challenge by properly disentangling it from o…
▽ More
While the field of Quality-Diversity (QD) has grown into a distinct branch of stochastic optimization, a few problems, in particular locomotion and navigation tasks, have become de facto standards. Are such benchmarks sufficient? Are they representative of the key challenges faced by QD algorithms? Do they provide the ability to focus on one particular challenge by properly disentangling it from others? Do they have much predictive power in terms of scalability and generalization? Existing benchmarks are not standardized, and there is currently no MNIST equivalent for QD. Inspired by recent works on Reinforcement Learning benchmarks, we argue that the identification of challenges faced by QD methods and the development of targeted, challenging, scalable but affordable benchmarks is an important step. As an initial effort, we identify three problems that are challenging in sparse reward settings, and propose associated benchmarks: (1) Behavior metric bias, which can result from the use of metrics that do not match the structure of the behavior space. (2) Behavioral Plateaus, with varying characteristics, such that esca** them would require adaptive QD algorithms and (3) Evolvability Traps, where small variations in genotype result in large behavioral changes. The environments that we propose satisfy the properties listed above.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
Geodesics, Non-linearities and the Archive of Novelty Search
Authors:
Achkan Salehi,
Alexandre Coninx,
Stephane Doncieux
Abstract:
The Novelty Search (NS) algorithm was proposed more than a decade ago. However, the mechanisms behind its empirical success are still not well formalized/understood. This short note focuses on the effects of the archive on exploration. Experimental evidence from a few application domains suggests that archive-based NS performs in general better than when Novelty is solely computed with respect to…
▽ More
The Novelty Search (NS) algorithm was proposed more than a decade ago. However, the mechanisms behind its empirical success are still not well formalized/understood. This short note focuses on the effects of the archive on exploration. Experimental evidence from a few application domains suggests that archive-based NS performs in general better than when Novelty is solely computed with respect to the population. An argument that is often encountered in the literature is that the archive prevents exploration from backtracking or cycling, i.e. from revisiting previously encountered areas in the behavior space. We argue that this is not a complete or accurate explanation as backtracking - beside often being desirable - can actually be enabled by the archive. Through low-dimensional/analytical examples, we show that a key effect of the archive is that it counterbalances the exploration biases that result, among other factors, from the use of inadequate behavior metrics and the non-linearities of the behavior map**. Our observations seem to hint that attributing a more active role to the archive in sampling can be beneficial.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
Comparison of the interaction between scalar fields as dark energy with neutrinos and put constraint on neutrino mass in these models
Authors:
Muhammad Yarahmadi,
Amin Salehi
Abstract:
In this paper, we considered the interaction neutrino with three scalar fields(phantom, quintessence, and quintom model) and put constraints on the total mass of neutrinos, and comparison the interaction constant $ β$ in these models. The data were used in this paper are supernova type Ia(pantheon catalog), CMB and BAO data. For each model, we first investigate the results obtained from Pantheon d…
▽ More
In this paper, we considered the interaction neutrino with three scalar fields(phantom, quintessence, and quintom model) and put constraints on the total mass of neutrinos, and comparison the interaction constant $ β$ in these models. The data were used in this paper are supernova type Ia(pantheon catalog), CMB and BAO data. For each model, we first investigate the results obtained from Pantheon data, then survey the CMB and BAO data, and finally, the total data from these catalogs. It seems that using a combination of data produces more favorable results. For combination data, we find that the total mass of neutrino $\sum m_ν< 0.121 eV$ $(95\% $ Confidence Level (C.L.) for quintom model and $\sum m_ν< 0.19 eV$ $(95\% $ Confidence Level (C.L.) for phantom model and $\sum m_ν< 0.124 eV$ $(95\% $ Confidence Level (C.L.) for quintessence model. These results are in good agreement with the results of Planck 2018 where the limit of the total neutrino mass is $\sum m_ν<0.12 eV$ ($95\%$ C.L., TT, TE, EE+lowE+lensing+BAO)
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Can Chameleon fields be the source of both the dark energy dipole and the CMB dipole?
Authors:
M. Yarahmadi,
S. Fathi,
A. Salehi
Abstract:
Recent research shows that the local group is moving toward (l,b)=(276,30) relative to the cosmic background radiation at a speed of about $600 kms^{-1}$ which is known as the cosmic background radiation dipole. The exact cause of this movement is still unknown. Areas with high mass densities such as galactic superclusters seem to be one of the causes of this flow. There are several methods for si…
▽ More
Recent research shows that the local group is moving toward (l,b)=(276,30) relative to the cosmic background radiation at a speed of about $600 kms^{-1}$ which is known as the cosmic background radiation dipole. The exact cause of this movement is still unknown. Areas with high mass densities such as galactic superclusters seem to be one of the causes of this flow. There are several methods for simulating the motion of local clusters, one of which is the bulk flow. The bulk current can be seen as a mass movement of a large part of the universe. This anisotropy at the local scale seems to have the same origin as the anisotropy at the larger scale. In this paper, anisotropies on both small and large scales were investigated using chameleon fields. The data used are Type Ia supernovae (Pantheon catalog for a total of 1,048 supernovae in redshift (0.15<z<2.3)). The results showed that on a smaller scale (less than 150 MPa) the direction of motion of the local group galaxies is the same as the direction of the bulk, and on larger scales, the direction of the bulk current is the same as the direction of the current of dark energy dipole.
△ Less
Submitted 8 March, 2022;
originally announced March 2022.
-
Fundamentals of Quantum Fourier Optics
Authors:
Mohammad Rezai,
Jawad A. Salehi
Abstract:
All-quantum signal processing techniques are at the core of the successful advancement of most information-based quantum technologies. This paper develops coherent and comprehensive methodologies and mathematical models to describe Fourier optical signal processing in full quantum terms for any input quantum state of light. We begin this paper by introducing a spatially two-dimensional quantum sta…
▽ More
All-quantum signal processing techniques are at the core of the successful advancement of most information-based quantum technologies. This paper develops coherent and comprehensive methodologies and mathematical models to describe Fourier optical signal processing in full quantum terms for any input quantum state of light. We begin this paper by introducing a spatially two-dimensional quantum state of a photon, associated with its wavefront and expressible as a two-dimensional creation operator. Then, by breaking down the Fourier optical processing apparatus into its key components, we strive to acquire the quantum unitary transformation or the input/output quantum relation of the two-dimensional creation operators. Subsequently, we take advantage of the above results to develop and obtain the quantum analogous of a few essential Fourier optical apparatus, such as quantum convolution via a 4f-processing system and a quantum 4f-processing system with periodic pupils. Moreover, due to the importance and widespread use of optical pulse sha** in various optical communications and optical sciences fields, we also present an analogous system in full quantum terms, namely quantum pulse sha** with an 8f-processing system. Finally, we apply our results to two extreme examples of the quantum state of light. One is based on a coherent (Glauber) state and the other on a single-photon number (Fock) state for each of the above optical systems. We believe the schemes and mathematical models developed in this paper can impact many areas of quantum optical signal processing, quantum holography, quantum communications, quantum radars and multiple-input/multiple-output antennas, and many more applications in quantum computations and quantum machine learning algorithms.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
SMSE: A Serverless Platform for Multimedia Cloud Systems
Authors:
Chavit Denninnart,
Mohsen Amini Salehi
Abstract:
Along with the rise of domain-specific computing (ASICs hardware) and domain-specific programming languages, we envision that the next step is the emergence of domain-specific cloud platforms. Develo** such platforms for popular applications in the serverless manner, not only can offer a higher efficiency to both users and providers, it can also expedite the application development cycles and en…
▽ More
Along with the rise of domain-specific computing (ASICs hardware) and domain-specific programming languages, we envision that the next step is the emergence of domain-specific cloud platforms. Develo** such platforms for popular applications in the serverless manner, not only can offer a higher efficiency to both users and providers, it can also expedite the application development cycles and enable users to become solution-oriented and focus on their specific business logic. Considering multimedia streaming as one of the most trendy applications in the IT industry, the goal of this study is to develop SMSE, the first domain-specific serverless platform for multimedia streaming. SMSE democratizes multimedia service development via enabling content providers (or even end-users) to rapidly develop their desired functionalities on their multimedia contents. Upon develo** SMSE, the next goal of this study is to deal with its efficiency challenges and develop a function container provisioning method that can efficiently utilize cloud resources and improve the users' QoS. In particular, we develop a dynamic method that provisions durable or ephemeral containers depending on the spatiotemporal and data-dependency characteristics of the functions. Evaluating the prototype implementation of SMSE under real-world settings demonstrates its capability to reduce both the containerization overhead, and the makespan time of serving multimedia processing functions (by up to 30%) in compare to the function provision methods that are being used in the general-purpose serverless cloud systems.
△ Less
Submitted 29 September, 2023; v1 submitted 6 January, 2022;
originally announced January 2022.
-
Exploring the Impact of Virtualization on the Usability of the Deep Learning Applications
Authors:
Davood G. Samani,
Mohsen Amini Salehi
Abstract:
Deep Learning-based (DL) applications are becoming increasingly popular and advancing at an unprecedented pace. While many research works are being undertaken to enhance Deep Neural Networks (DNN) -- the centerpiece of DL applications -- practical deployment challenges of these applications in the Cloud and Edge systems, and their impact on the usability of the applications have not been sufficien…
▽ More
Deep Learning-based (DL) applications are becoming increasingly popular and advancing at an unprecedented pace. While many research works are being undertaken to enhance Deep Neural Networks (DNN) -- the centerpiece of DL applications -- practical deployment challenges of these applications in the Cloud and Edge systems, and their impact on the usability of the applications have not been sufficiently investigated. In particular, the impact of deploying different virtualization platforms, offered by the Cloud and Edge, on the usability of DL applications (in terms of the End-to-End (E2E) inference time) has remained an open question. Importantly, resource elasticity (by means of scale-up), CPU pinning, and processor type (CPU vs GPU) configurations have shown to be influential on the virtualization overhead. Accordingly, the goal of this research is to study the impact of these potentially decisive deployment options on the E2E performance, thus, usability of the DL applications. To that end, we measure the impact of four popular execution platforms (namely, bare-metal, virtual machine (VM), container, and container in VM) on the E2E inference time of four types of DL applications, upon changing processor configuration (scale-up, CPU pinning) and processor types. This study reveals a set of interesting and sometimes counter-intuitive findings that can be used as best practices by Cloud solution architects to efficiently deploy DL applications in various systems. The notable finding is that the solution architects must be aware of the DL application characteristics, particularly, their pre- and post-processing requirements, to be able to optimally choose and configure an execution platform, determine the use of GPU, and decide the efficient scale-up range.
△ Less
Submitted 17 December, 2021;
originally announced December 2021.
-
Efficiency in the Serverless Cloud Paradigm: A Survey on the Reusing and Approximation Aspects
Authors:
Chavit Denninnart,
Thanawat Chanikaphon,
Mohsen Amini Salehi
Abstract:
Serverless computing along with Function-as-a-Service (FaaS) is forming a new computing paradigm that is anticipated to found the next generation of cloud systems. The popularity of this paradigm is due to offering a highly transparent infrastructure that enables user applications to scale in the granularity of their functions. Since these often small and single-purpose functions are managed on sh…
▽ More
Serverless computing along with Function-as-a-Service (FaaS) is forming a new computing paradigm that is anticipated to found the next generation of cloud systems. The popularity of this paradigm is due to offering a highly transparent infrastructure that enables user applications to scale in the granularity of their functions. Since these often small and single-purpose functions are managed on shared computing resources behind the scene, a great potential for computational reuse and approximate computing emerges that if unleashed, can remarkably improve the efficiency of serverless cloud systems -- both from the user's QoS and system's (energy consumption and incurred cost) perspectives. Accordingly, the goal of this survey study is to, first, unfold the internal mechanics of serverless computing and, second, explore the scope for efficiency within this paradigm via studying function reuse and approximation approaches and discussing the pros and cons of each one. Next, we outline potential future research directions within this paradigm that can either unlock new use cases or make the paradigm more efficient.
△ Less
Submitted 25 June, 2023; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Few-shot Quality-Diversity Optimization
Authors:
Achkan Salehi,
Alexandre Coninx,
Stephane Doncieux
Abstract:
In the past few years, a considerable amount of research has been dedicated to the exploitation of previous learning experiences and the design of Few-shot and Meta Learning approaches, in problem domains ranging from Computer Vision to Reinforcement Learning based control. A notable exception, where to the best of our knowledge, little to no effort has been made in this direction is Quality-Diver…
▽ More
In the past few years, a considerable amount of research has been dedicated to the exploitation of previous learning experiences and the design of Few-shot and Meta Learning approaches, in problem domains ranging from Computer Vision to Reinforcement Learning based control. A notable exception, where to the best of our knowledge, little to no effort has been made in this direction is Quality-Diversity (QD) optimization. QD methods have been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning. However, they remain costly due to their reliance on inherently sample inefficient evolutionary processes. We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation. Our proposed method does not require backpropagation. It is simple to implement and scale, and furthermore, it is agnostic to the underlying models that are being trained. Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
△ Less
Submitted 18 January, 2024; v1 submitted 14 September, 2021;
originally announced September 2021.
-
DDCNet-Multires: Effective Receptive Field Guided Multiresolution CNN for Dense Prediction
Authors:
Ali Salehi,
Madhusudhanan Balasubramanian
Abstract:
Dense optical flow estimation is challenging when there are large displacements in a scene with heterogeneous motion dynamics, occlusion, and scene homogeneity. Traditional approaches to handle these challenges include hierarchical and multiresolution processing methods. Learning-based optical flow methods typically use a multiresolution approach with image war** when a broad range of flow veloc…
▽ More
Dense optical flow estimation is challenging when there are large displacements in a scene with heterogeneous motion dynamics, occlusion, and scene homogeneity. Traditional approaches to handle these challenges include hierarchical and multiresolution processing methods. Learning-based optical flow methods typically use a multiresolution approach with image war** when a broad range of flow velocities and heterogeneous motion is present. Accuracy of such coarse-to-fine methods is affected by the ghosting artifacts when images are warped across multiple resolutions and by the vanishing problem in smaller scene extents with higher motion contrast. Previously, we devised strategies for building compact dense prediction networks guided by the effective receptive field (ERF) characteristics of the network (DDCNet). The DDCNet design was intentionally simple and compact allowing it to be used as a building block for designing more complex yet compact networks. In this work, we extend the DDCNet strategies to handle heterogeneous motion dynamics by cascading DDCNet based sub-nets with decreasing extents of their ERF. Our DDCNet with multiresolution capability (DDCNet-Multires) is compact without any specialized network layers. We evaluate the performance of the DDCNet-Multires network using standard optical flow benchmark datasets. Our experiments demonstrate that DDCNet-Multires improves over the DDCNet-B0 and -B1 and provides optical flow estimates with accuracy comparable to similar lightweight learning-based methods.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
DDCNet: Deep Dilated Convolutional Neural Network for Dense Prediction
Authors:
Ali Salehi,
Madhusudhanan Balasubramanian
Abstract:
Dense pixel matching problems such as optical flow and disparity estimation are among the most challenging tasks in computer vision. Recently, several deep learning methods designed for these problems have been successful. A sufficiently larger effective receptive field (ERF) and a higher resolution of spatial features within a network are essential for providing higher-resolution dense estimates.…
▽ More
Dense pixel matching problems such as optical flow and disparity estimation are among the most challenging tasks in computer vision. Recently, several deep learning methods designed for these problems have been successful. A sufficiently larger effective receptive field (ERF) and a higher resolution of spatial features within a network are essential for providing higher-resolution dense estimates. In this work, we present a systemic approach to design network architectures that can provide a larger receptive field while maintaining a higher spatial feature resolution. To achieve a larger ERF, we utilized dilated convolutional layers. By aggressively increasing dilation rates in the deeper layers, we were able to achieve a sufficiently larger ERF with a significantly fewer number of trainable parameters. We used optical flow estimation problem as the primary benchmark to illustrate our network design strategy. The benchmark results (Sintel, KITTI, and Middlebury) indicate that our compact networks can achieve comparable performance in the class of lightweight networks.
△ Less
Submitted 9 July, 2021;
originally announced July 2021.
-
Quantum CDMA Communication Systems
Authors:
Mohammad Rezai,
Jawad A. Salehi
Abstract:
Barcoding photons can provide a host of functionalities that could benefit future quantum communication systems and networks beyond today's imagination. As a significant application of barcoding photons, we introduce code division multiple-access (CDMA) communication systems for various applications. In this context, we introduce and discuss the fundamental principles of a novel quantum CDMA (QCDM…
▽ More
Barcoding photons can provide a host of functionalities that could benefit future quantum communication systems and networks beyond today's imagination. As a significant application of barcoding photons, we introduce code division multiple-access (CDMA) communication systems for various applications. In this context, we introduce and discuss the fundamental principles of a novel quantum CDMA (QCDMA) technique based on spectrally encoding and decoding of continuous-mode quantum light pulses. In particular, we present the mathematical models of various QCDMA modules that are fundamental in describing an ideal and typical QCDMA system, such as quantum signal sources, quantum spectral encoding phase operators, M$\times$M quantum broadcasting star-coupler, quantum spectral phase decoding operators, and the quantum receivers. In describing a QCDMA system, this paper considers a unified approach where the input continuous-mode quantum light pulses can take on any form of pure states such as Glauber states and quantum number states. For input number states, one can observe features like entanglement and quantum interference. More interestingly, due to Heisenberg's uncertainty principle, the quantum signals sent by photon number states obtain complete phase uncertainty at the time of measurement. Therefore, at the receiver output, the multiaccess inter-signal interference vanishes. Due to Heisenberg's uncertainty principle, the received signal intensity at the photodetector's output changes from a coherent detection scheme for input Glauber states to an incoherent detection scheme for input number states. Our mathematical model is valuable in the signal design and data modulations of point-to-point quantum communications, quantum pulse sha**, and quantum radar signals and systems where the inputs are continuous mode quantum signals.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
ATRIA: A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-DRAM CNN Processing
Authors:
Supreeth Mysore Shivanandamurthy,
Ishan. G. Thakkar,
Sayed Ahmad Salehi
Abstract:
With the rapidly growing use of Convolutional Neural Networks (CNNs) in real-world applications related to machine learning and Artificial Intelligence (AI), several hardware accelerator designs for CNN inference and training have been proposed recently. In this paper, we present ATRIA, a novel bit-pArallel sTochastic aRithmetic based In-DRAM Accelerator for energy-efficient and high-speed inferen…
▽ More
With the rapidly growing use of Convolutional Neural Networks (CNNs) in real-world applications related to machine learning and Artificial Intelligence (AI), several hardware accelerator designs for CNN inference and training have been proposed recently. In this paper, we present ATRIA, a novel bit-pArallel sTochastic aRithmetic based In-DRAM Accelerator for energy-efficient and high-speed inference of CNNs. ATRIA employs light-weight modifications in DRAM cell arrays to implement bit-parallel stochastic arithmetic based acceleration of multiply-accumulate (MAC) operations inside DRAM. ATRIA significantly improves the latency, throughput, and efficiency of processing CNN inferences by performing 16 MAC operations in only five consecutive memory operation cycles. We mapped the inference tasks of four benchmark CNNs on ATRIA to compare its performance with five state-of-the-art in-DRAM CNN accelerators from prior work. The results of our analysis show that ATRIA exhibits only 3.5% drop in CNN inference accuracy and still achieves improvements of up to 3.2x in frames-per-second (FPS) and up to 10x in efficiency (FPS/W/mm2), compared to the best-performing in-DRAM accelerator from prior work.
△ Less
Submitted 26 May, 2021;
originally announced May 2021.
-
Harnessing the Potential of Function-Reuse in Multimedia Cloud Systems
Authors:
Chavit Denninnart,
Mohsen Amini Salehi
Abstract:
Cloud-based computing systems can get oversubscribed due to the budget constraints of their users or limitations in certain resource types. The oversubscription can, in turn, degrade the users perceived Quality of Service (QoS). The approach we investigate to mitigate both the oversubscription and the incurred cost is based on smart reusing of the computation needed to process the service requests…
▽ More
Cloud-based computing systems can get oversubscribed due to the budget constraints of their users or limitations in certain resource types. The oversubscription can, in turn, degrade the users perceived Quality of Service (QoS). The approach we investigate to mitigate both the oversubscription and the incurred cost is based on smart reusing of the computation needed to process the service requests (i.e., tasks). We propose a reusing paradigm for the tasks that are waiting for execution. This paradigm can be particularly impactful in serverless platforms where multiple users can request similar services simultaneously. Our motivation is a multimedia streaming engine that processes the media segments in an on-demand manner. We propose a mechanism to identify various types of "mergeable" tasks and aggregate them to improve the QoS and mitigate the incurred cost. We develop novel approaches to determine when and how to perform task aggregation such that the QoS of other tasks is not affected. Evaluation results show that the proposed mechanism can improve the QoS by significantly reducing the percentage of tasks missing their deadlines %. In addition, it can and reduce the overall time (and subsequently the incurred cost) of utilizing cloud services by more than 9%.
△ Less
Submitted 9 April, 2021;
originally announced April 2021.
-
BR-NS: an Archive-less Approach to Novelty Search
Authors:
Achkan Salehi,
Alexandre Coninx,
Stephane Doncieux
Abstract:
As open-ended learning based on divergent search algorithms such as Novelty Search (NS) draws more and more attention from the research community, it is natural to expect that its application to increasingly complex real-world problems will require the exploration to operate in higher dimensional Behavior Spaces which will not necessarily be Euclidean. Novelty Search traditionally relies on k-near…
▽ More
As open-ended learning based on divergent search algorithms such as Novelty Search (NS) draws more and more attention from the research community, it is natural to expect that its application to increasingly complex real-world problems will require the exploration to operate in higher dimensional Behavior Spaces which will not necessarily be Euclidean. Novelty Search traditionally relies on k-nearest neighbours search and an archive of previously visited behavior descriptors which are assumed to live in a Euclidean space. This is problematic because of a number of issues. On one hand, Euclidean distance and Nearest-neighbour search are known to behave differently and become less meaningful in high dimensional spaces. On the other hand, the archive has to be bounded since, memory considerations aside, the computational complexity of finding nearest neighbours in that archive grows linearithmically with its size. A sub-optimal bound can result in "cycling" in the behavior space, which inhibits the progress of the exploration. Furthermore, the performance of NS depends on a number of algorithmic choices and hyperparameters, such as the strategies to add or remove elements to the archive and the number of neighbours to use in k-nn search. In this paper, we discuss an alternative approach to novelty estimation, dubbed Behavior Recognition based Novelty Search (BR-NS), which does not require an archive, makes no assumption on the metrics that can be defined in the behavior space and does not rely on nearest neighbours search. We conduct experiments to gain insight into its feasibility and dynamics as well as potential advantages over archive-based NS in terms of time complexity.
△ Less
Submitted 8 April, 2021;
originally announced April 2021.
-
ODIN: A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-Situ Neural Network Processing in Phase Change RAM
Authors:
Supreeth Mysore Shivanandamurthy,
Ishan. G. Thakkar,
Sayed Ahmad Salehi
Abstract:
Due to the very rapidly growing use of Artificial Neural Networks (ANNs) in real-world applications related to machine learning and Artificial Intelligence (AI), several hardware accelerator de-signs for ANNs have been proposed recently. In this paper, we present a novel processing-in-memory (PIM) engine called ODIN that employs hybrid binary-stochastic bit-parallel arithmetic in-side phase change…
▽ More
Due to the very rapidly growing use of Artificial Neural Networks (ANNs) in real-world applications related to machine learning and Artificial Intelligence (AI), several hardware accelerator de-signs for ANNs have been proposed recently. In this paper, we present a novel processing-in-memory (PIM) engine called ODIN that employs hybrid binary-stochastic bit-parallel arithmetic in-side phase change RAM (PCRAM) to enable a low-overhead in-situ acceleration of all essential ANN functions such as multiply-accumulate (MAC), nonlinear activation, and pooling. We mapped four ANN benchmark applications on ODIN to compare its performance with a conventional processor-centric design and a crossbar-based in-situ ANN accelerator from prior work. The results of our analysis for the considered ANN topologies indicate that our ODIN accelerator can be at least 5.8x faster and 23.2x more energy-efficient, and up to 90.8x faster and 1554x more energy-efficient, compared to the crossbar-based in-situ ANN accelerator from prior work.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
SAED: Edge-Based Intelligence for Privacy-Preserving Enterprise Search on the Cloud
Authors:
Sakib M Zobaed,
Mohsen Amini Salehi,
Rajkumar Buyya
Abstract:
Cloud-based enterprise search services (e.g., AWS Kendra) have been entrancing big data owners by offering convenient and real-time search solutions to them. However, the problem is that individuals and organizations possessing confidential big data are hesitant to embrace such services due to valid data privacy concerns. In addition, to offer an intelligent search, these services access the user…
▽ More
Cloud-based enterprise search services (e.g., AWS Kendra) have been entrancing big data owners by offering convenient and real-time search solutions to them. However, the problem is that individuals and organizations possessing confidential big data are hesitant to embrace such services due to valid data privacy concerns. In addition, to offer an intelligent search, these services access the user search history that further jeopardizes his/her privacy. To overcome the privacy problem, the main idea of this research is to separate the intelligence aspect of the search from its pattern matching aspect. According to this idea, the search intelligence is provided by an on-premises edge tier and the shared cloud tier only serves as an exhaustive pattern matching search utility. We propose Smartness At Edge (SAED mechanism that offers intelligence in the form of semantic and personalized search at the edge tier while maintaining privacy of the search on the cloud tier. At the edge tier, SAED uses a knowledge-based lexical database to expand the query and cover its semantics. SAED personalizes the search via an RNN model that can learn the user interest. A word embedding model is used to retrieve documents based on their semantic relevance to the search query. SAED is generic and can be plugged into existing enterprise search systems and enable them to offer intelligent and privacy-preserving search without enforcing any change on them. Evaluation results on two enterprise search systems under real settings and verified by human users demonstrate that SAED can improve the relevancy of the retrieved results by on average 24% for plain-text and 75% for encrypted generic datasets.
△ Less
Submitted 11 March, 2021; v1 submitted 26 February, 2021;
originally announced February 2021.
-
SensPick: Sense Picking for Word Sense Disambiguation
Authors:
Sm Zobaed,
Md Enamul Haque,
Md Fazle Rabby,
Mohsen Amini Salehi
Abstract:
Word sense disambiguation (WSD) methods identify the most suitable meaning of a word with respect to the usage of that word in a specific context. Neural network-based WSD approaches rely on a sense-annotated corpus since they do not utilize lexical resources. In this study, we utilize both context and related gloss information of a target word to model the semantic relationship between the word a…
▽ More
Word sense disambiguation (WSD) methods identify the most suitable meaning of a word with respect to the usage of that word in a specific context. Neural network-based WSD approaches rely on a sense-annotated corpus since they do not utilize lexical resources. In this study, we utilize both context and related gloss information of a target word to model the semantic relationship between the word and the set of glosses. We propose SensPick, a type of stacked bidirectional Long Short Term Memory (LSTM) network to perform the WSD task. The experimental evaluation demonstrates that SensPick outperforms traditional and state-of-the-art models on most of the benchmark datasets with a relative improvement of 3.5% in F-1 score. While the improvement is not significant, incorporating semantic relationships brings SensPick in the leading position compared to others.
△ Less
Submitted 9 February, 2021;
originally announced February 2021.
-
Analyzing the Performance of Smart Industry 4.0 Applications on Cloud Computing Systems
Authors:
Razin Farhan Hussain,
Alireza Pakravan,
Mohsen Amini Salehi
Abstract:
Cloud-based Deep Neural Network (DNN) applications that make latency-sensitive inference are becoming an indispensable part of Industry 4.0. Due to the multi-tenancy and resource heterogeneity, both inherent to the cloud computing environments, the inference time of DNN-based applications are stochastic. Such stochasticity, if not captured, can potentially lead to low Quality of Service (QoS) or e…
▽ More
Cloud-based Deep Neural Network (DNN) applications that make latency-sensitive inference are becoming an indispensable part of Industry 4.0. Due to the multi-tenancy and resource heterogeneity, both inherent to the cloud computing environments, the inference time of DNN-based applications are stochastic. Such stochasticity, if not captured, can potentially lead to low Quality of Service (QoS) or even a disaster in critical sectors, such as Oil and Gas industry. To make Industry 4.0 robust, solution architects and researchers need to understand the behavior of DNN-based applications and capture the stochasticity exists in their inference times. Accordingly, in this study, we provide a descriptive analysis of the inference time from two perspectives. First, we perform an application-centric analysis and statistically model the execution time of four categorically different DNN applications on both Amazon and Chameleon clouds. Second, we take a resource-centric approach and analyze a rate-based metric in form of Million Instruction Per Second (MIPS) for heterogeneous machines in the cloud. This non-parametric modeling, achieved via Jackknife and Bootstrap re-sampling methods, provides the confidence interval of MIPS for heterogeneous cloud machines. The findings of this research can be helpful for researchers and cloud solution architects to develop solutions that are robust against the stochastic nature of the inference time of DNN applications in the cloud and can offer a higher QoS to their users and avoid unintended outcomes.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
Descriptive and Predictive Analysis of Aggregating Functions in Serverless Clouds: the Case of Video Streaming
Authors:
Shangrui Wu,
Chavit Denninnart,
Xiangbo Li,
Yang Wang,
Mohsen Amini Salehi
Abstract:
Serverless clouds allocate multiple tasks (e.g., micro-services) from multiple users on a shared pool of computing resources. This enables serverless cloud providers to reduce their resource usage by transparently aggregate similar tasks of a certain context (e.g., video processing) that share the whole or part of their computation. To this end, it is crucial to know the amount of time-saving achi…
▽ More
Serverless clouds allocate multiple tasks (e.g., micro-services) from multiple users on a shared pool of computing resources. This enables serverless cloud providers to reduce their resource usage by transparently aggregate similar tasks of a certain context (e.g., video processing) that share the whole or part of their computation. To this end, it is crucial to know the amount of time-saving achieved by aggregating the tasks. Lack of such knowledge can lead to uninformed merging and scheduling decisions that, in turn, can cause deadline violation of either the merged tasks or other following tasks. Accordingly, in this paper, we study the problem of estimating execution-time saving resulted from merging tasks with the example in the context of video processing. To learn the execution-time saving in different forms of merging, we first establish a set of benchmarking videos and examine a wide variety of video processing tasks -- with and without merging in place. We observed that although merging can save up to 44% in the execution-time, the number of possible merging cases is intractable. Hence, in the second part, we leverage the benchmarking results and develop a method based on Gradient Boosting Decision Tree (GBDT) to estimate the time-saving for any given task merging case. Experimental results show that the method can estimate the time-saving with the error rate of 0.04, measured based on Root Mean Square Error (RMSE).
△ Less
Submitted 10 December, 2020;
originally announced December 2020.