-
BanglaNLP at BLP-2023 Task 1: Benchmarking different Transformer Models for Violence Inciting Text Detection in Bengali
Authors:
Saumajit Saha,
Albert Nanda
Abstract:
This paper presents the system that we have developed while solving this shared task on violence inciting text detection in Bangla. We explain both the traditional and the recent approaches that we have used to make our models learn. Our proposed system helps to classify if the given text contains any threat. We studied the impact of data augmentation when there is a limited dataset available. Our…
▽ More
This paper presents the system that we have developed while solving this shared task on violence inciting text detection in Bangla. We explain both the traditional and the recent approaches that we have used to make our models learn. Our proposed system helps to classify if the given text contains any threat. We studied the impact of data augmentation when there is a limited dataset available. Our quantitative results show that finetuning a multilingual-e5-base model performed the best in our task compared to other transformer-based architectures. We obtained a macro F1 of 68.11\% in the test set and our performance in this shared task is ranked at 23 in the leaderboard.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models for Sentiment Analysis of Bangla Social Media Posts
Authors:
Saumajit Saha,
Albert Nanda
Abstract:
Bangla is the 7th most widely spoken language globally, with a staggering 234 million native speakers primarily hailing from India and Bangladesh. This morphologically rich language boasts a rich literary tradition, encompassing diverse dialects and language-specific challenges. Despite its linguistic richness and history, Bangla remains categorized as a low-resource language within the natural la…
▽ More
Bangla is the 7th most widely spoken language globally, with a staggering 234 million native speakers primarily hailing from India and Bangladesh. This morphologically rich language boasts a rich literary tradition, encompassing diverse dialects and language-specific challenges. Despite its linguistic richness and history, Bangla remains categorized as a low-resource language within the natural language processing (NLP) and speech community. This paper presents our submission to Task 2 (Sentiment Analysis of Bangla Social Media Posts) of the BLP Workshop. We experiment with various Transformer-based architectures to solve this task. Our quantitative results show that transfer learning really helps in better learning of the models in this low-resource language scenario. This becomes evident when we further finetune a model which has already been finetuned on twitter data for sentiment analysis task and that finetuned model performs the best among all other models. We also perform a detailed error analysis where we find some instances where ground truth labels need to be relooked at. We obtain a micro-F1 of 67.02\% on the test set and our performance in this shared task is ranked at 21 in the leaderboard.
△ Less
Submitted 17 October, 2023; v1 submitted 13 October, 2023;
originally announced October 2023.
-
NeuCASL: From Logic Design to System Simulation of Neuromorphic Engines
Authors:
Dharanidhar Dang,
Amitash Nanda,
Bill Lin,
Debashis Sahoo
Abstract:
With Moore's law saturating and Dennard scaling hitting its wall, traditional Von Neuman systems cannot offer the GFlops/watt for compute-intensive algorithms such as CNN. Recent trends in unconventional computing approaches give us hope to design highly energy-efficient computing systems for such algorithms. Neuromorphic computing is a promising such approach with its brain-inspired circuitry, us…
▽ More
With Moore's law saturating and Dennard scaling hitting its wall, traditional Von Neuman systems cannot offer the GFlops/watt for compute-intensive algorithms such as CNN. Recent trends in unconventional computing approaches give us hope to design highly energy-efficient computing systems for such algorithms. Neuromorphic computing is a promising such approach with its brain-inspired circuitry, use of emerging technologies, and low-power nature. Researchers use a variety of novel technologies such as memristors, silicon photonics, FinFET, and carbon nanotubes to demonstrate a neuromorphic computer. However, a flexible CAD tool to start from neuromorphic logic design and go up to architectural simulation is yet to be demonstrated to support the rise of this promising paradigm. In this project, we aim to build NeuCASL, an opensource python-based full system CAD framework for neuromorphic logic design, circuit simulation, and system performance and reliability estimation. This is a first of its kind to the best of our knowledge.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.
-
KOLOMVERSE: KRISO open large-scale image dataset for object detection in the maritime universe
Authors:
Abhilasha Nanda,
Sung Won Cho,
Hyeopwoo Lee,
** Hyoung Park
Abstract:
Over the years, datasets have been developed for various object detection tasks. Object detection in the maritime domain is essential for the safety and navigation of ships. However, there is still a lack of publicly available large-scale datasets in the maritime domain. To overcome this challenge, we present KOLOMVERSE, an open large-scale image dataset for object detection in the maritime domain…
▽ More
Over the years, datasets have been developed for various object detection tasks. Object detection in the maritime domain is essential for the safety and navigation of ships. However, there is still a lack of publicly available large-scale datasets in the maritime domain. To overcome this challenge, we present KOLOMVERSE, an open large-scale image dataset for object detection in the maritime domain by KRISO (Korea Research Institute of Ships and Ocean Engineering). We collected 5,845 hours of video data captured from 21 territorial waters of South Korea. Through an elaborate data quality assessment process, we gathered around 2,151,470 4K resolution images from the video data. This dataset considers various environments: weather, time, illumination, occlusion, viewpoint, background, wind speed, and visibility. The KOLOMVERSE consists of five classes (ship, buoy, fishnet buoy, lighthouse and wind farm) for maritime object detection. The dataset has images of 3840$\times$2160 pixels and to our knowledge, it is by far the largest publicly available dataset for object detection in the maritime domain. We performed object detection experiments and evaluated our dataset on several pre-trained state-of-the-art architectures to show the effectiveness and usefulness of our dataset. The dataset is available at: \url{https://github.com/MaritimeDataset/KOLOMVERSE}.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
Reputation, Risk, and Trust on User Adoption of Internet Search Engines: The Case of DuckDuckGo
Authors:
Antonios Saravanos,
Stavros Zervoudakis,
Dongnanzi Zheng,
Amarpreet Nanda,
Georgios Shaheen,
Charles Hornat,
Jeremiah Konde Chaettle,
Alassane Yoda,
Hyeree Park,
Will Ang
Abstract:
This paper investigates the determinants of end-user adoption of the DuckDuckGo search engine coupling the standard UTAUT model with factors to reflect reputation, risk, and trust. An experimental approach was taken to validate our model, where participants were exposed to the DuckDuckGo product using a vignette. Subsequently, answering questions on their perception of the technology. The data was…
▽ More
This paper investigates the determinants of end-user adoption of the DuckDuckGo search engine coupling the standard UTAUT model with factors to reflect reputation, risk, and trust. An experimental approach was taken to validate our model, where participants were exposed to the DuckDuckGo product using a vignette. Subsequently, answering questions on their perception of the technology. The data was analyzed using the partial least squares-structural equation modeling (PLS-SEM) approach. From the nine distinct factors studied, we found that 'Performance Expectancy' played the greatest role in user decisions on adoption, followed by 'Firm Reputation', 'Initial Trust in Technology', 'Social Influence', and an individual's 'Disposition to Trust'. We conclude by exploring how these findings can explain DuckDuckGo's rising prominence as a search engine.
△ Less
Submitted 24 November, 2022; v1 submitted 19 June, 2022;
originally announced June 2022.
-
Weak-Key Analysis for BIKE Post-Quantum Key Encapsulation Mechanism
Authors:
Mohammad Reza Nosouhi,
Syed W. Shah,
Lei Pan,
Yevhen Zolotavkin,
Ashish Nanda,
Praveen Gauravaram,
Robin Doss
Abstract:
The evolution of quantum computers poses a serious threat to contemporary public-key encryption (PKE) schemes. To address this impending issue, the National Institute of Standards and Technology (NIST) is currently undertaking the Post-Quantum Cryptography (PQC) standardization project intending to evaluate and subsequently standardize the suitable PQC scheme(s). One such attractive approach, call…
▽ More
The evolution of quantum computers poses a serious threat to contemporary public-key encryption (PKE) schemes. To address this impending issue, the National Institute of Standards and Technology (NIST) is currently undertaking the Post-Quantum Cryptography (PQC) standardization project intending to evaluate and subsequently standardize the suitable PQC scheme(s). One such attractive approach, called Bit Flip** Key Encapsulation (BIKE), has made to the final round of the competition. Despite having some attractive features, the IND-CCA security of the BIKE depends on the average decoder failure rate (DFR), a higher value of which can facilitate a particular type of side-channel attack. Although the BIKE adopts a Black-Grey-Flip (BGF) decoder that offers a negligible DFR, the effect of weak-keys on the average DFR has not been fully investigated. Therefore, in this paper, we first perform an implementation of the BIKE scheme, and then through extensive experiments show that the weak-keys can be a potential threat to IND-CCA security of the BIKE scheme and thus need attention from the research community prior to standardization. We also propose a key-check algorithm that can potentially supplement the BIKE mechanism and prevent users from generating and adopting weak keys to address this issue.
△ Less
Submitted 13 July, 2022; v1 submitted 29 April, 2022;
originally announced April 2022.
-
Adverse Media Mining for KYC and ESG Compliance
Authors:
Rupinder Paul Khandpur,
Albert Aristotle Nanda,
Mathew Davis,
Chen Li,
Daulet Nurmanbetov,
Sankalp Gaur,
Ashit Talukder
Abstract:
In recent years, institutions operating in the global market economy face growing risks stemming from non-financial risk factors such as cyber, third-party, and reputational outweighing traditional risks of credit and liquidity. Adverse media or negative news screening is crucial for the identification of such non-financial risks. Typical tools for screening are not real-time, involve manual searc…
▽ More
In recent years, institutions operating in the global market economy face growing risks stemming from non-financial risk factors such as cyber, third-party, and reputational outweighing traditional risks of credit and liquidity. Adverse media or negative news screening is crucial for the identification of such non-financial risks. Typical tools for screening are not real-time, involve manual searches, require labor-intensive monitoring of information sources. Moreover, they are costly processes to maintain up-to-date with complex regulatory requirements and the institution's evolving risk appetite.
In this extended abstract, we present an automated system to conduct both real-time and batch search of adverse media for users' queries (person or organization entities) using news and other open-source, unstructured sources of information. Our scalable, machine-learning driven approach to high-precision, adverse news filtering is based on four perspectives - relevance to risk domains, search query (entity) relevance, adverse sentiment analysis, and risk encoding. With the help of model evaluations and case studies, we summarize the performance of our deployed application.
△ Less
Submitted 21 October, 2021;
originally announced October 2021.
-
Supporting Massive DLRM Inference Through Software Defined Memory
Authors:
Ehsan K. Ardestani,
Changkyu Kim,
Seung Jae Lee,
Luoshang Pan,
Valmiki Rampersad,
Jens Axboe,
Banit Agrawal,
Fuxun Yu,
Ansha Yu,
Trung Le,
Hector Yuen,
Shishir Juluri,
Akshat Nanda,
Manoj Wodekar,
Dheevatsa Mudigere,
Krishnakumar Nair,
Maxim Naumov,
Chris Peterson,
Mikhail Smelyanskiy,
Vijay Rao
Abstract:
Deep Learning Recommendation Models (DLRM) are widespread, account for a considerable data center footprint, and grow by more than 1.5x per year. With model size soon to be in terabytes range, leveraging Storage ClassMemory (SCM) for inference enables lower power consumption and cost. This paper evaluates the major challenges in extending the memory hierarchy to SCM for DLRM, and presents differen…
▽ More
Deep Learning Recommendation Models (DLRM) are widespread, account for a considerable data center footprint, and grow by more than 1.5x per year. With model size soon to be in terabytes range, leveraging Storage ClassMemory (SCM) for inference enables lower power consumption and cost. This paper evaluates the major challenges in extending the memory hierarchy to SCM for DLRM, and presents different techniques to improve performance through a Software Defined Memory. We show how underlying technologies such as Nand Flash and 3DXP differentiate, and relate to real world scenarios, enabling from 5% to 29% power savings.
△ Less
Submitted 8 November, 2021; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Node Sensing & Dynamic Discovering Routes for Wireless Sensor Networks
Authors:
Arabinda Nanda,
Amiya Kumar Rath,
Saroj Kumar Rout
Abstract:
The applications of Wireless Sensor Networks (WSN) contain a wide variety of scenarios. In most of them, the network is composed of a significant number of nodes deployed in an extensive area in which not all nodes are directly connected. Then, the data exchange is supported by multihop communications. Routing protocols are in charge of discovering and maintaining the routes in the network. Howeve…
▽ More
The applications of Wireless Sensor Networks (WSN) contain a wide variety of scenarios. In most of them, the network is composed of a significant number of nodes deployed in an extensive area in which not all nodes are directly connected. Then, the data exchange is supported by multihop communications. Routing protocols are in charge of discovering and maintaining the routes in the network. However, the correctness of a particular routing protocol mainly depends on the capabilities of the nodes and on the application requirements. This paper presents a dynamic discover routing method for communication between sensor nodes and a base station in WSN. This method tolerates failures of arbitrary individual nodes in the network (node failure) or a small part of the network (area failure). Each node in the network does only local routing preservation, needs to record only its neighbor nodes' information, and incurs no extra routing overhead during failure free periods. It dynamically discovers new routes when an intermediate node or a small part of the network in the path from a sensor node to a base station fails. In our planned method, every node decides its path based only on local information, such as its parent node and neighbor nodes' routing information. So, it is possible to form a loop in the routing path. We believe that the loop problem in sensor network routing is not as serious as that in the Internet routing or traditional mobile ad-hoc routing. We are trying to find all possible loops and eliminate the loops as far as possible in WSN.
△ Less
Submitted 9 April, 2010;
originally announced April 2010.