-
"Golden Ratio Yoshimura" for Meta-Stable and Massively Reconfigurable Deployment
Authors:
Vishrut Deshpande,
Yogesh Phalak,
Ziyang Zhou,
Ian Walker,
Suyi Li
Abstract:
Yoshimura origami is a classical folding pattern that has inspired many deployable structure designs. Its applications span from space exploration, kinetic architectures, and soft robots to even everyday household items. However, despite its wide usage, Yoshimura has been fixated on a set of design constraints to ensure its flat-foldability. Through extensive kinematic analysis and prototype tests…
▽ More
Yoshimura origami is a classical folding pattern that has inspired many deployable structure designs. Its applications span from space exploration, kinetic architectures, and soft robots to even everyday household items. However, despite its wide usage, Yoshimura has been fixated on a set of design constraints to ensure its flat-foldability. Through extensive kinematic analysis and prototype tests, this study presents a new Yoshimura that intentionally defies these constraints. Remarkably, one can impart a unique meta-stability by using the Golden Ratio angle to define the triangular facets of a generalized Yoshimura. As a result, when its facets are strategically popped out, a ``Golden Ratio Yoshimura'' boom with $m$ modules can be theoretically reconfigured into $8^m$ geometrically unique and load-bearing shapes. This result not only challenges the existing design norms but also opens up a new avenue to create deployable and versatile structural systems.
△ Less
Submitted 30 May, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Emergent Abilities in Reduced-Scale Generative Language Models
Authors:
Sherin Muckatira,
Vijeta Deshpande,
Vladislav Lialin,
Anna Rumshisky
Abstract:
Large language models can solve new tasks without task-specific fine-tuning. This ability, also known as in-context learning (ICL), is considered an emergent ability and is primarily seen in large language models with billions of parameters. This study investigates if such emergent properties are strictly tied to model size or can be demonstrated by smaller models trained on reduced-scale data. To…
▽ More
Large language models can solve new tasks without task-specific fine-tuning. This ability, also known as in-context learning (ICL), is considered an emergent ability and is primarily seen in large language models with billions of parameters. This study investigates if such emergent properties are strictly tied to model size or can be demonstrated by smaller models trained on reduced-scale data. To explore this, we simplify pre-training data and pre-train 36 causal language models with parameters varying from 1 million to 165 million parameters. We show that models trained on this simplified pre-training data demonstrate enhanced zero-shot capabilities across various tasks in simplified language, achieving performance comparable to that of pre-trained models six times larger on unrestricted language. This suggests that downscaling the language allows zero-shot learning capabilities to emerge in models with limited size. Additionally, we find that these smaller models pre-trained on simplified data demonstrate a power law relationship between the evaluation loss and the three scaling factors: compute, dataset size, and model size.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
LocalTweets to LocalHealth: A Mental Health Surveillance Framework Based on Twitter Data
Authors:
Vijeta Deshpande,
Minhwa Lee,
Zonghai Yao,
Zihao Zhang,
Jason Brian Gibbons,
Hong Yu
Abstract:
Prior research on Twitter (now X) data has provided positive evidence of its utility in develo** supplementary health surveillance systems. In this study, we present a new framework to surveil public health, focusing on mental health (MH) outcomes. We hypothesize that locally posted tweets are indicative of local MH outcomes and collect tweets posted from 765 neighborhoods (census block groups)…
▽ More
Prior research on Twitter (now X) data has provided positive evidence of its utility in develo** supplementary health surveillance systems. In this study, we present a new framework to surveil public health, focusing on mental health (MH) outcomes. We hypothesize that locally posted tweets are indicative of local MH outcomes and collect tweets posted from 765 neighborhoods (census block groups) in the USA. We pair these tweets from each neighborhood with the corresponding MH outcome reported by the Center for Disease Control (CDC) to create a benchmark dataset, LocalTweets. With LocalTweets, we present the first population-level evaluation task for Twitter-based MH surveillance systems. We then develop an efficient and effective method, LocalHealth, for predicting MH outcomes based on LocalTweets. When used with GPT3.5, LocalHealth achieves the highest F1-score and accuracy of 0.7429 and 79.78\%, respectively, a 59\% improvement in F1-score over the GPT3.5 in zero-shot setting. We also utilize LocalHealth to extrapolate CDC's estimates to proxy unreported neighborhoods, achieving an F1-score of 0.7291. Our work suggests that Twitter data can be effectively leveraged to simulate neighborhood-level MH outcomes.
△ Less
Submitted 26 March, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Energy-conserving equivariant GNN for elasticity of lattice architected metamaterials
Authors:
Ivan Grega,
Ilyes Batatia,
Gábor Csányi,
Sri Karlapati,
Vikram S. Deshpande
Abstract:
Lattices are architected metamaterials whose properties strongly depend on their geometrical design. The analogy between lattices and graphs enables the use of graph neural networks (GNNs) as a faster surrogate model compared to traditional methods such as finite element modelling. In this work, we generate a big dataset of structure-property relationships for strut-based lattices. The dataset is…
▽ More
Lattices are architected metamaterials whose properties strongly depend on their geometrical design. The analogy between lattices and graphs enables the use of graph neural networks (GNNs) as a faster surrogate model compared to traditional methods such as finite element modelling. In this work, we generate a big dataset of structure-property relationships for strut-based lattices. The dataset is made available to the community which can fuel the development of methods anchored in physical principles for the fitting of fourth-order tensors. In addition, we present a higher-order GNN model trained on this dataset. The key features of the model are (i) SE(3) equivariance, and (ii) consistency with the thermodynamic law of conservation of energy. We compare the model to non-equivariant models based on a number of error metrics and demonstrate its benefits in terms of predictive performance and reduced training requirements. Finally, we demonstrate an example application of the model to an architected material design task. The methods which we developed are applicable to fourth-order tensors beyond elasticity such as piezo-optical tensor etc.
△ Less
Submitted 20 March, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Voice-Based Smart Assistant System for Vehicles using RASA
Authors:
Aditya Paranjape,
Yash Patwardhan,
Vedant Deshpande,
Aniket Darp,
Jayashree Jagdale
Abstract:
Conversational AIs, or chatbots, mimic human speech when conversing. Smart assistants facilitate the automation of several tasks that needed human intervention earlier. Because of their accuracy, absence of dependence on human resources, and accessibility around the clock, chatbots can be employed in vehicles too. Due to people's propensity to divert their attention away from the task of driving w…
▽ More
Conversational AIs, or chatbots, mimic human speech when conversing. Smart assistants facilitate the automation of several tasks that needed human intervention earlier. Because of their accuracy, absence of dependence on human resources, and accessibility around the clock, chatbots can be employed in vehicles too. Due to people's propensity to divert their attention away from the task of driving while engaging in other activities like calling, playing music, navigation, and getting updates on the weather forecast and latest news, road safety has declined and accidents have increased as a result. It would be advantageous to automate these tasks using voice commands rather than carrying them out manually. This paper focuses on the development of a voice-based smart assistance application for vehicles based on the RASA framework. The smart assistant provides functionalities like navigation, communication via calls, getting weather forecasts and the latest news updates, and music that are completely voice-based in nature.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Student Activity Recognition in Classroom Environments using Transfer Learning
Authors:
Anagha Deshpande,
Vedant Deshpande
Abstract:
The recent advances in artificial intelligence and deep learning facilitate automation in various applications including home automation, smart surveillance systems, and healthcare among others. Human Activity Recognition is one of its emerging applications, which can be implemented in a classroom environment to enhance safety, efficiency, and overall educational quality. This paper proposes a sys…
▽ More
The recent advances in artificial intelligence and deep learning facilitate automation in various applications including home automation, smart surveillance systems, and healthcare among others. Human Activity Recognition is one of its emerging applications, which can be implemented in a classroom environment to enhance safety, efficiency, and overall educational quality. This paper proposes a system for detecting and recognizing the activities of students in a classroom environment. The dataset has been structured and recorded by the authors since a standard dataset for this task was not available at the time of this study. Transfer learning, a widely adopted method within the field of deep learning, has proven to be helpful in complex tasks like image and video processing. Pretrained models including VGG-16, ResNet-50, InceptionV3, and Xception are used for feature extraction and classification tasks. Xception achieved an accuracy of 93%, on the novel classroom dataset, outperforming the other three models in consideration. The system proposed in this study aims to introduce a safer and more productive learning environment for students and educators.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
Mavericks at NADI 2023 Shared Task: Unravelling Regional Nuances through Dialect Identification using Transformer-based Approach
Authors:
Vedant Deshpande,
Yash Patwardhan,
Kshitij Deshpande,
Sudeep Mangalvedhekar,
Ravindra Murumkar
Abstract:
In this paper, we present our approach for the "Nuanced Arabic Dialect Identification (NADI) Shared Task 2023". We highlight our methodology for subtask 1 which deals with country-level dialect identification. Recognizing dialects plays an instrumental role in enhancing the performance of various downstream NLP tasks such as speech recognition and translation. The task uses the Twitter dataset (TW…
▽ More
In this paper, we present our approach for the "Nuanced Arabic Dialect Identification (NADI) Shared Task 2023". We highlight our methodology for subtask 1 which deals with country-level dialect identification. Recognizing dialects plays an instrumental role in enhancing the performance of various downstream NLP tasks such as speech recognition and translation. The task uses the Twitter dataset (TWT-2023) that encompasses 18 dialects for the multi-class classification problem. Numerous transformer-based models, pre-trained on Arabic language, are employed for identifying country-level dialects. We fine-tune these state-of-the-art models on the provided dataset. The ensembling method is leveraged to yield improved performance of the system. We achieved an F1-score of 76.65 (11th rank on the leaderboard) on the test dataset.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Mavericks at ArAIEval Shared Task: Towards a Safer Digital Space -- Transformer Ensemble Models Tackling Deception and Persuasion
Authors:
Sudeep Mangalvedhekar,
Kshitij Deshpande,
Yash Patwardhan,
Vedant Deshpande,
Ravindra Murumkar
Abstract:
In this paper, we highlight our approach for the "Arabic AI Tasks Evaluation (ArAiEval) Shared Task 2023". We present our approaches for task 1-A and task 2-A of the shared task which focus on persuasion technique detection and disinformation detection respectively. Detection of persuasion techniques and disinformation has become imperative to avoid distortion of authentic information. The tasks u…
▽ More
In this paper, we highlight our approach for the "Arabic AI Tasks Evaluation (ArAiEval) Shared Task 2023". We present our approaches for task 1-A and task 2-A of the shared task which focus on persuasion technique detection and disinformation detection respectively. Detection of persuasion techniques and disinformation has become imperative to avoid distortion of authentic information. The tasks use multigenre snippets of tweets and news articles for the given binary classification problem. We experiment with several transformer-based models that were pre-trained on the Arabic language. We fine-tune these state-of-the-art models on the provided dataset. Ensembling is employed to enhance the performance of the systems. We achieved a micro F1-score of 0.742 on task 1-A (8th rank on the leaderboard) and 0.901 on task 2-A (7th rank on the leaderboard) respectively.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Honey, I Shrunk the Language: Language Model Behavior at Reduced Scale
Authors:
Vijeta Deshpande,
Dan Pechi,
Shree Thatte,
Vladislav Lialin,
Anna Rumshisky
Abstract:
In recent years, language models have drastically grown in size, and the abilities of these models have been shown to improve with scale. The majority of recent scaling laws studies focused on high-compute high-parameter count settings, leaving the question of when these abilities begin to emerge largely unanswered. In this paper, we investigate whether the effects of pre-training can be observed…
▽ More
In recent years, language models have drastically grown in size, and the abilities of these models have been shown to improve with scale. The majority of recent scaling laws studies focused on high-compute high-parameter count settings, leaving the question of when these abilities begin to emerge largely unanswered. In this paper, we investigate whether the effects of pre-training can be observed when the problem size is reduced, modeling a smaller, reduced-vocabulary language. We show the benefits of pre-training with masked language modeling (MLM) objective in models as small as 1.25M parameters, and establish a strong correlation between pre-training perplexity and downstream performance (GLUE benchmark). We examine downscaling effects, extending scaling laws to models as small as ~1M parameters. At this scale, we observe a break of the power law for compute-optimal models and show that the MLM loss does not scale smoothly with compute-cost (FLOPs) below $2.2 \times 10^{15}$ FLOPs. We also find that adding layers does not always benefit downstream performance.
△ Less
Submitted 30 May, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
Authors:
Vladislav Lialin,
Vijeta Deshpande,
Anna Rumshisky
Abstract:
This paper presents a systematic overview and comparison of parameter-efficient fine-tuning methods covering over 40 papers published between February 2019 and February 2023. These methods aim to resolve the infeasibility and impracticality of fine-tuning large language models by only training a small set of parameters. We provide a taxonomy that covers a broad range of methods and present a detai…
▽ More
This paper presents a systematic overview and comparison of parameter-efficient fine-tuning methods covering over 40 papers published between February 2019 and February 2023. These methods aim to resolve the infeasibility and impracticality of fine-tuning large language models by only training a small set of parameters. We provide a taxonomy that covers a broad range of methods and present a detailed method comparison with a specific focus on real-life efficiency and fine-tuning multibillion-scale language models.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Transformer Based Implementation for Automatic Book Summarization
Authors:
Siddhant Porwal,
Laxmi Bewoor,
Vivek Deshpande
Abstract:
Document Summarization is the procedure of generating a meaningful and concise summary of a given document with the inclusion of relevant and topic-important points. There are two approaches: one is picking up the most relevant statements from the document itself and adding it to the Summary known as Extractive and the other is generating sentences for the Summary known as Abstractive Summarizatio…
▽ More
Document Summarization is the procedure of generating a meaningful and concise summary of a given document with the inclusion of relevant and topic-important points. There are two approaches: one is picking up the most relevant statements from the document itself and adding it to the Summary known as Extractive and the other is generating sentences for the Summary known as Abstractive Summarization. Training a machine learning model to perform tasks that are time-consuming or very difficult for humans to evaluate is a major challenge. Book Abstract generation is one of such complex tasks. Traditional machine learning models are getting modified with pre-trained transformers. Transformer based language models trained in a self-supervised fashion are gaining a lot of attention; when fine-tuned for Natural Language Processing(NLP) downstream task like text summarization. This work is an attempt to use Transformer based techniques for Abstract generation.
△ Less
Submitted 17 January, 2023;
originally announced January 2023.
-
A Ferroelectric Tunnel Junction-based Integrate-and-Fire Neuron
Authors:
Paolo Gibertini,
Luca Fehlings,
Suzanne Lancaster,
Quang Duong,
Thomas Mikolajick,
Catherine Dubourdieu,
Stefan Slesazeck,
Erika Covi,
Veeresh Deshpande
Abstract:
Event-based neuromorphic systems provide a low-power solution by using artificial neurons and synapses to process data asynchronously in the form of spikes. Ferroelectric Tunnel Junctions (FTJs) are ultra low-power memory devices and are well-suited to be integrated in these systems. Here, we present a hybrid FTJ-CMOS Integrate-and-Fire neuron which constitutes a fundamental building block for new…
▽ More
Event-based neuromorphic systems provide a low-power solution by using artificial neurons and synapses to process data asynchronously in the form of spikes. Ferroelectric Tunnel Junctions (FTJs) are ultra low-power memory devices and are well-suited to be integrated in these systems. Here, we present a hybrid FTJ-CMOS Integrate-and-Fire neuron which constitutes a fundamental building block for new-generation neuromorphic networks for edge computing. We demonstrate electrically tunable neural dynamics achievable by tuning the switching of the FTJ device.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Extracting Biomedical Factual Knowledge Using Pretrained Language Model and Electronic Health Record Context
Authors:
Zonghai Yao,
Yi Cao,
Zhichao Yang,
Vijeta Deshpande,
Hong Yu
Abstract:
Language Models (LMs) have performed well on biomedical natural language processing applications. In this study, we conducted some experiments to use prompt methods to extract knowledge from LMs as new knowledge Bases (LMs as KBs). However, prompting can only be used as a low bound for knowledge extraction, and perform particularly poorly on biomedical domain KBs. In order to make LMs as KBs more…
▽ More
Language Models (LMs) have performed well on biomedical natural language processing applications. In this study, we conducted some experiments to use prompt methods to extract knowledge from LMs as new knowledge Bases (LMs as KBs). However, prompting can only be used as a low bound for knowledge extraction, and perform particularly poorly on biomedical domain KBs. In order to make LMs as KBs more in line with the actual application scenarios of the biomedical domain, we specifically add EHR notes as context to the prompt to improve the low bound in the biomedical domain. We design and validate a series of experiments for our Dynamic-Context-BioLAMA task. Our experiments show that the knowledge possessed by those language models can distinguish the correct knowledge from the noise knowledge in the EHR notes, and such distinguishing ability can also be used as a new metric to evaluate the amount of knowledge possessed by the model.
△ Less
Submitted 20 October, 2022; v1 submitted 25 August, 2022;
originally announced September 2022.
-
Decentralized Periodic Approach for Adaptive Fault Diagnosis in Distributed Systems
Authors:
Latika Sarna,
Sumedha Shenolikar,
Poorva Kulkarni,
Varsha Deshpande,
Supriya Kelkar
Abstract:
In this paper, Decentralized Periodic Approach for Adaptive Fault Diagnosis (DP-AFD) algorithm is proposed for fault diagnosis in distributed systems with arbitrary topology. Faulty nodes may be either unresponsive, may have either software or hardware faults. The proposed algorithm detects the faulty nodes situated in geographically distributed locations. This algorithm does not depend on a singl…
▽ More
In this paper, Decentralized Periodic Approach for Adaptive Fault Diagnosis (DP-AFD) algorithm is proposed for fault diagnosis in distributed systems with arbitrary topology. Faulty nodes may be either unresponsive, may have either software or hardware faults. The proposed algorithm detects the faulty nodes situated in geographically distributed locations. This algorithm does not depend on a single node or leader to detect the faults in the system. However, it empowers more than one node to detect the fault-free and faulty nodes in the system. Thus, at the end of each test cycle, every fault-free node acts as a leader to diagnose faults in the system. This feature of the algorithm makes it applicable to any arbitrary network. After every test cycle of the algorithm, all the nodes have knowledge about faulty nodes and each node is tested only once. With this knowledge, there can be redistribution of load, which was earlier assigned to the faulty nodes. Also, the algorithm permits repaired node re-entry and new node entry. In a system of n nodes, the maximum number of faulty nodes can be (n-1) which is detected by DP-AFD algorithm. DP-AFD is periodic in nature which executes test cycles after regular intervals to detect the faulty nodes in the given distributed system.
△ Less
Submitted 19 December, 2018;
originally announced December 2018.
-
Fault Diagnosis for Distributed Systems using Accuracy Technique
Authors:
Poorva Kulkarni,
Varsha Deshpande,
Latika Sarna,
Sumedha Shenolikar,
Supriya Kelkar
Abstract:
Distributed Systems involve two or more computer systems which may be situated at geographically distinct locations and are connected by a communication network. Due to failures in the communication link, faults arise which may make the entire system dysfunctional. To enable seamless operation of the distributed system, these faults need to be detected and located accurately. This paper examines v…
▽ More
Distributed Systems involve two or more computer systems which may be situated at geographically distinct locations and are connected by a communication network. Due to failures in the communication link, faults arise which may make the entire system dysfunctional. To enable seamless operation of the distributed system, these faults need to be detected and located accurately. This paper examines various techniques of handling faults in distributed systems and proposes and innovative technique which uses percent accuracy for detecting faulty nodes in the system. Every node in the system acts as an initiator and votes for certifying faulty nodes in the system. This certification is done on the basis of percent accuracy value of each faulty node which should exceed a predefined threshold value to qualify node as faulty. As the threshold increases, the number of faulty nodes detected in the system reduces. This is a decentralized approach with no dependency on a single node to act as a leader for diagnosis. This technique is also applicable to ad-hoc networks, which are static in nature.
△ Less
Submitted 19 December, 2018;
originally announced December 2018.