Search | arXiv e-print repository

arXiv:2405.20079 [pdf, other]

Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language Learning

Authors: Elena Grazia Gado, Tommaso Martorella, Luca Zunino, Paola Mejia-Domenzain, Vinitra Swamy, Jibril Frej, Tanja Käser

Abstract: Intelligent Tutoring Systems (ITS) enhance personalized learning by predicting student answers to provide immediate and customized instruction. However, recent research has primarily focused on the correctness of the answer rather than the student's performance on specific answer choices, limiting insights into students' thought processes and potential misconceptions. To address this gap, we prese… ▽ More Intelligent Tutoring Systems (ITS) enhance personalized learning by predicting student answers to provide immediate and customized instruction. However, recent research has primarily focused on the correctness of the answer rather than the student's performance on specific answer choices, limiting insights into students' thought processes and potential misconceptions. To address this gap, we present MCQStudentBert, an answer forecasting model that leverages the capabilities of Large Language Models (LLMs) to integrate contextual understanding of students' answering history along with the text of the questions and answers. By predicting the specific answer choices students are likely to make, practitioners can easily extend the model to new answer choices or remove answer choices for the same multiple-choice question (MCQ) without retraining the model. In particular, we compare MLP, LSTM, BERT, and Mistral 7B architectures to generate embeddings from students' past interactions, which are then incorporated into a finetuned BERT's answer-forecasting mechanism. We apply our pipeline to a dataset of language learning MCQ, gathered from an ITS with over 10,000 students to explore the predictive accuracy of MCQStudentBert, which incorporates student interaction patterns, in comparison to correct answer prediction and traditional mastery-learning feature-based approaches. This work opens the door to more personalized content, modularization, and granular support. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted as a poster paper at EDM 2024: 17th International Conference on Educational Data Mining in Atlanta, USA

arXiv:2402.02933 [pdf, other]

InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of Experts

Authors: Vinitra Swamy, Syrielle Montariol, Julian Blackwell, Jibril Frej, Martin Jaggi, Tanja Käser

Abstract: Interpretability for neural networks is a trade-off between three key requirements: 1) faithfulness of the explanation (i.e., how perfectly it explains the prediction), 2) understandability of the explanation by humans, and 3) model performance. Most existing methods compromise one or more of these requirements; e.g., post-hoc approaches provide limited faithfulness, automatically identified featu… ▽ More Interpretability for neural networks is a trade-off between three key requirements: 1) faithfulness of the explanation (i.e., how perfectly it explains the prediction), 2) understandability of the explanation by humans, and 3) model performance. Most existing methods compromise one or more of these requirements; e.g., post-hoc approaches provide limited faithfulness, automatically identified feature masks compromise understandability, and intrinsically interpretable methods such as decision trees limit model performance. These shortcomings are unacceptable for sensitive applications such as education and healthcare, which require trustworthy explanations, actionable interpretations, and accurate predictions. In this work, we present InterpretCC (interpretable conditional computation), a family of interpretable-by-design neural networks that guarantee human-centric interpretability, while maintaining comparable performance to state-of-the-art models by adaptively and sparsely activating features before prediction. We extend this idea into an interpretable, global mixture-of-experts (MoE) model that allows humans to specify topics of interest, discretely separates the feature space for each data point into topical subnetworks, and adaptively and sparsely activates these topical subnetworks for prediction. We apply variations of the InterpretCC architecture for text, time series and tabular data across several real-world benchmarks, demonstrating comparable performance with non-interpretable baselines, outperforming interpretable-by-design baselines, and showing higher actionability and usefulness according to a user study. △ Less

Submitted 29 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2311.16079 [pdf, other]

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Authors: Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, Antoine Bosselut

Abstract: Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by rele… ▽ More Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer), and extends pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and internationally-recognized medical guidelines. Evaluations using four major medical benchmarks show significant performance gains over several state-of-the-art baselines before and after task-specific finetuning. Overall, MEDITRON achieves a 6% absolute performance gain over the best public baseline in its parameter class and 3% over the strongest baseline we finetuned from Llama-2. Compared to closed-source LLMs, MEDITRON-70B outperforms GPT-3.5 and Med-PaLM and is within 5% of GPT-4 and 10% of Med-PaLM-2. We release our code for curating the medical pretraining corpus and the MEDITRON model weights to drive open-source development of more capable medical LLMs. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.03311 [pdf, other]

Unraveling Downstream Gender Bias from Large Language Models: A Study on AI Educational Writing Assistance

Authors: Thiemo Wambsganss, Xiaotian Su, Vinitra Swamy, Seyed Parsa Neshaei, Roman Rietsche, Tanja Käser

Abstract: Large Language Models (LLMs) are increasingly utilized in educational tasks such as providing writing suggestions to students. Despite their potential, LLMs are known to harbor inherent biases which may negatively impact learners. Previous studies have investigated bias in models and data representations separately, neglecting the potential impact of LLM bias on human writing. In this paper, we in… ▽ More Large Language Models (LLMs) are increasingly utilized in educational tasks such as providing writing suggestions to students. Despite their potential, LLMs are known to harbor inherent biases which may negatively impact learners. Previous studies have investigated bias in models and data representations separately, neglecting the potential impact of LLM bias on human writing. In this paper, we investigate how bias transfers through an AI writing support pipeline. We conduct a large-scale user study with 231 students writing business case peer reviews in German. Students are divided into five groups with different levels of writing support: one classroom group with feature-based suggestions and four groups recruited from Prolific -- a control group with no assistance, two groups with suggestions from fine-tuned GPT-2 and GPT-3 models, and one group with suggestions from pre-trained GPT-3.5. Using GenBit gender bias analysis, Word Embedding Association Tests (WEAT), and Sentence Embedding Association Test (SEAT) we evaluate the gender bias at various stages of the pipeline: in model embeddings, in suggestions generated by the models, and in reviews written by students. Our results demonstrate that there is no significant difference in gender bias between the resulting peer reviews of groups with and without LLM suggestions. Our research is therefore optimistic about the use of AI writing support in the classroom, showcasing a context where bias in LLMs does not transfer to students' responses. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: Accepted as a full paper at EMNLP Findings 2023

arXiv:2309.14118 [pdf, other]

MultiModN- Multimodal, Multi-Task, Interpretable Modular Networks

Authors: Vinitra Swamy, Malika Satayeva, Jibril Frej, Thierry Bossy, Thijs Vogels, Martin Jaggi, Tanja Käser, Mary-Anne Hartley

Abstract: Predicting multiple real-world tasks in a single model often requires a particularly diverse feature space. Multimodal (MM) models aim to extract the synergistic predictive potential of multiple data types to create a shared feature space with aligned semantic meaning across inputs of drastically varying sizes (i.e. images, text, sound). Most current MM architectures fuse these representations in… ▽ More Predicting multiple real-world tasks in a single model often requires a particularly diverse feature space. Multimodal (MM) models aim to extract the synergistic predictive potential of multiple data types to create a shared feature space with aligned semantic meaning across inputs of drastically varying sizes (i.e. images, text, sound). Most current MM architectures fuse these representations in parallel, which not only limits their interpretability but also creates a dependency on modality availability. We present MultiModN, a multimodal, modular network that fuses latent representations in a sequence of any number, combination, or type of modality while providing granular real-time predictive feedback on any number or combination of predictive tasks. MultiModN's composable pipeline is interpretable-by-design, as well as innately multi-task and robust to the fundamental issue of biased missingness. We perform four experiments on several benchmark MM datasets across 10 real-world tasks (predicting medical diagnoses, academic performance, and weather), and show that MultiModN's sequential MM fusion does not compromise performance compared with a baseline of parallel fusion. By simulating the challenging bias of missing not-at-random (MNAR), this work shows that, contrary to MultiModN, parallel fusion baselines erroneously learn MNAR and suffer catastrophic failure when faced with different patterns of MNAR at inference. To the best of our knowledge, this is the first inherently MNAR-resistant approach to MM modeling. In conclusion, MultiModN provides granular insights, robustness, and flexibility without compromising performance. △ Less

Submitted 6 November, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: Accepted as a full paper at NeurIPS 2023 in New Orleans, USA

arXiv:2307.00364 [pdf, other]

The future of human-centric eXplainable Artificial Intelligence (XAI) is not post-hoc explanations

Authors: Vinitra Swamy, Jibril Frej, Tanja Käser

Abstract: Explainable Artificial Intelligence (XAI) plays a crucial role in enabling human understanding and trust in deep learning systems. As models get larger, more ubiquitous, and pervasive in aspects of daily life, explainability is necessary to minimize adverse effects of model mistakes. Unfortunately, current approaches in human-centric XAI (e.g. predictive tasks in healthcare, education, or personal… ▽ More Explainable Artificial Intelligence (XAI) plays a crucial role in enabling human understanding and trust in deep learning systems. As models get larger, more ubiquitous, and pervasive in aspects of daily life, explainability is necessary to minimize adverse effects of model mistakes. Unfortunately, current approaches in human-centric XAI (e.g. predictive tasks in healthcare, education, or personalized ads) tend to rely on a single post-hoc explainer, whereas recent work has identified systematic disagreement between post-hoc explainers when applied to the same instances of underlying black-box models. In this paper, we therefore present a call for action to address the limitations of current state-of-the-art explainers. We propose a shift from post-hoc explainability to designing interpretable neural network architectures. We identify five needs of human-centric XAI (real-time, accurate, actionable, human-interpretable, and consistent) and propose two schemes for interpretable-by-design neural network workflows (adaptive routing with InterpretCC and temporal diagnostics with I2MD). We postulate that the future of human-centric XAI is neither in explaining black-boxes nor in reverting to traditional, interpretable models, but in neural networks that are intrinsically interpretable. △ Less

Submitted 28 May, 2024; v1 submitted 1 July, 2023; originally announced July 2023.

Comments: Viewpoint paper, under review at JAIR

arXiv:2212.08955 [pdf, other]

Trusting the Explainers: Teacher Validation of Explainable Artificial Intelligence for Course Design

Authors: Vinitra Swamy, Sijia Du, Mirko Marras, Tanja Käser

Abstract: Deep learning models for learning analytics have become increasingly popular over the last few years; however, these approaches are still not widely adopted in real-world settings, likely due to a lack of trust and transparency. In this paper, we tackle this issue by implementing explainable AI methods for black-box neural networks. This work focuses on the context of online and blended learning a… ▽ More Deep learning models for learning analytics have become increasingly popular over the last few years; however, these approaches are still not widely adopted in real-world settings, likely due to a lack of trust and transparency. In this paper, we tackle this issue by implementing explainable AI methods for black-box neural networks. This work focuses on the context of online and blended learning and the use case of student success prediction models. We use a pairwise study design, enabling us to investigate controlled differences between pairs of courses. Our analyses cover five course pairs that differ in one educationally relevant aspect and two popular instance-based explainable AI methods (LIME and SHAP). We quantitatively compare the distances between the explanations across courses and methods. We then validate the explanations of LIME and SHAP with 26 semi-structured interviews of university-level educators regarding which features they believe contribute most to student success, which explanations they trust most, and how they could transform these insights into actionable course design decisions. Our results show that quantitatively, explainers significantly disagree with each other about what is important, and qualitatively, experts themselves do not agree on which explanations are most trustworthy. All code, extended results, and the interview protocol are provided at https://github.com/epfl-ml4ed/trusting-explainers. △ Less

Submitted 6 March, 2023; v1 submitted 17 December, 2022; originally announced December 2022.

Comments: Accepted as a full paper (Best Paper nominee) at LAK 2023: The 13th International Learning Analytics and Knowledge Conference, March 13-17, 2023, Arlington, Texas, USA

arXiv:2212.01133 [pdf, other]

RIPPLE: Concept-Based Interpretation for Raw Time Series Models in Education

Authors: Mohammad Asadi, Vinitra Swamy, Jibril Frej, Julien Vignoud, Mirko Marras, Tanja Käser

Abstract: Time series is the most prevalent form of input data for educational prediction tasks. The vast majority of research using time series data focuses on hand-crafted features, designed by experts for predictive performance and interpretability. However, extracting these features is labor-intensive for humans and computers. In this paper, we propose an approach that utilizes irregular multivariate ti… ▽ More Time series is the most prevalent form of input data for educational prediction tasks. The vast majority of research using time series data focuses on hand-crafted features, designed by experts for predictive performance and interpretability. However, extracting these features is labor-intensive for humans and computers. In this paper, we propose an approach that utilizes irregular multivariate time series modeling with graph neural networks to achieve comparable or better accuracy with raw time series clickstreams in comparison to hand-crafted features. Furthermore, we extend concept activation vectors for interpretability in raw time series models. We analyze these advances in the education domain, addressing the task of early student performance prediction for downstream targeted interventions and instructional support. Our experimental analysis on 23 MOOCs with millions of combined interactions over six behavioral dimensions show that models designed with our approach can (i) beat state-of-the-art educational time series baselines with no feature extraction and (ii) provide interpretable insights for personalized interventions. Source code: https://github.com/epfl-ml4ed/ripple/. △ Less

Submitted 28 February, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: Accepted as a full paper at AAAI 2023: 37th AAAI Conference on Artificial Intelligence (EAAI: AI for Education Special Track), 7-14 of February 2023, Washington DC, USA

arXiv:2209.10335 [pdf, other]

Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling

Authors: Thiemo Wambsganss, Vinitra Swamy, Roman Rietsche, Tanja Käser

Abstract: Natural Language Processing (NLP) has become increasingly utilized to provide adaptivity in educational applications. However, recent research has highlighted a variety of biases in pre-trained language models. While existing studies investigate bias in different domains, they are limited in addressing fine-grained analysis on educational and multilingual corpora. In this work, we analyze bias acr… ▽ More Natural Language Processing (NLP) has become increasingly utilized to provide adaptivity in educational applications. However, recent research has highlighted a variety of biases in pre-trained language models. While existing studies investigate bias in different domains, they are limited in addressing fine-grained analysis on educational and multilingual corpora. In this work, we analyze bias across text and through multiple architectures on a corpus of 9,165 German peer-reviews collected from university students over five years. Notably, our corpus includes labels such as helpfulness, quality, and critical aspect ratings from the peer-review recipient as well as demographic attributes. We conduct a Word Embedding Association Test (WEAT) analysis on (1) our collected corpus in connection with the clustered labels, (2) the most common pre-trained German language models (T5, BERT, and GPT-2) and GloVe embeddings, and (3) the language models after fine-tuning on our collected data-set. In contrast to our initial expectations, we found that our collected corpus does not reveal many biases in the co-occurrence analysis or in the GloVe embeddings. However, the pre-trained German language models find substantial conceptual, racial, and gender bias and have significant changes in bias across conceptual and racial axes during fine-tuning on the peer-review data. With our research, we aim to contribute to the fourth UN sustainability goal (quality education) with a novel dataset, an understanding of biases in natural language education data, and the potential harms of not counteracting biases in language models for educational tasks. △ Less

Submitted 22 September, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

Comments: Accepted as a full paper at COLING 2022: The 29th International Conference on Computational Linguistics, 12-17 of October 2022, Gyeongju, Republic of Korea

arXiv:2207.00551 [pdf, other]

Evaluating the Explainers: Black-Box Explainable Machine Learning for Student Success Prediction in MOOCs

Authors: Vinitra Swamy, Bahar Radmehr, Natasa Krco, Mirko Marras, Tanja Käser

Abstract: Neural networks are ubiquitous in applied machine learning for education. Their pervasive success in predictive performance comes alongside a severe weakness, the lack of explainability of their decisions, especially relevant in human-centric fields. We implement five state-of-the-art methodologies for explaining black-box machine learning models (LIME, PermutationSHAP, KernelSHAP, DiCE, CEM) and… ▽ More Neural networks are ubiquitous in applied machine learning for education. Their pervasive success in predictive performance comes alongside a severe weakness, the lack of explainability of their decisions, especially relevant in human-centric fields. We implement five state-of-the-art methodologies for explaining black-box machine learning models (LIME, PermutationSHAP, KernelSHAP, DiCE, CEM) and examine the strengths of each approach on the downstream task of student performance prediction for five massive open online courses. Our experiments demonstrate that the families of explainers do not agree with each other on feature importance for the same Bidirectional LSTM models with the same representative set of students. We use Principal Component Analysis, Jensen-Shannon distance, and Spearman's rank-order correlation to quantitatively cross-examine explanations across methods and courses. Furthermore, we validate explainer performance across curriculum-based prerequisite relationships. Our results come to the concerning conclusion that the choice of explainer is an important decision and is in fact paramount to the interpretation of the predictive results, even more so than the course the model is trained on. Source code and models are released at http://github.com/epfl-ml4ed/evaluating-explainers. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Comments: Accepted as a full paper at EDM 2022: The 15th International Conference on Educational Data Mining, 24-27 of July 2022, Durham

arXiv:2205.01064 [pdf, other]

Meta Transfer Learning for Early Success Prediction in MOOCs

Authors: Vinitra Swamy, Mirko Marras, Tanja Käser

Abstract: Despite the increasing popularity of massive open online courses (MOOCs), many suffer from high dropout and low success rates. Early prediction of student success for targeted intervention is therefore essential to ensure no student is left behind in a course. There exists a large body of research in success prediction for MOOCs, focusing mainly on training models from scratch for individual cours… ▽ More Despite the increasing popularity of massive open online courses (MOOCs), many suffer from high dropout and low success rates. Early prediction of student success for targeted intervention is therefore essential to ensure no student is left behind in a course. There exists a large body of research in success prediction for MOOCs, focusing mainly on training models from scratch for individual courses. This setting is impractical in early success prediction as the performance of a student is only known at the end of the course. In this paper, we aim to create early success prediction models that can be transferred between MOOCs from different domains and topics. To do so, we present three novel strategies for transfer: 1) pre-training a model on a large set of diverse courses, 2) leveraging the pre-trained model by including meta information about courses, and 3) fine-tuning the model on previous course iterations. Our experiments on 26 MOOCs with over 145,000 combined enrollments and millions of interactions show that models combining interaction data and course information have comparable or better performance than models which have access to previous iterations of the course. With these models, we aim to effectively enable educators to warm-start their predictions for new and ongoing courses. △ Less

Submitted 25 April, 2022; originally announced May 2022.

Comments: Accepted at the 2022 ACM Conference on Learning at Scale (L@S 2022)

arXiv:2111.08546 [pdf, other]

Interpreting Language Models Through Knowledge Graph Extraction

Authors: Vinitra Swamy, Angelika Romanou, Martin Jaggi

Abstract: Transformer-based language models trained on large text corpora have enjoyed immense popularity in the natural language processing community and are commonly used as a starting point for downstream tasks. While these models are undeniably useful, it is a challenge to quantify their performance beyond traditional accuracy metrics. In this paper, we compare BERT-based language models through snapsho… ▽ More Transformer-based language models trained on large text corpora have enjoyed immense popularity in the natural language processing community and are commonly used as a starting point for downstream tasks. While these models are undeniably useful, it is a challenge to quantify their performance beyond traditional accuracy metrics. In this paper, we compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process. Structured relationships from training corpora may be uncovered through querying a masked language model with probing tasks. We present a methodology to unveil a knowledge acquisition timeline by generating knowledge graph extracts from cloze "fill-in-the-blank" statements at various stages of RoBERTa's early training. We extend this analysis to a comparison of pretrained variations of BERT models (DistilBERT, BERT-base, RoBERTa). This work proposes a quantitative framework to compare language models through knowledge graph extraction (GED, Graph2Vec) and showcases a part-of-speech analysis (POSOR) to identify the linguistic strengths of each model variant. Using these metrics, machine learning practitioners can compare models, diagnose their models' behavioral strengths and weaknesses, and identify new targeted datasets to improve model performance. △ Less

Submitted 16 November, 2021; originally announced November 2021.

Comments: Published at NeurIPS 2021: eXplainable AI for Debugging and Diagnosis Workshop

arXiv:2110.07525 [pdf, other]

Connection Management xAPP for O-RAN RIC: A Graph Neural Network and Reinforcement Learning Approach

Authors: Oner Orhan, Vasuki Narasimha Swamy, Thomas Tetzlaff, Marcel Nassar, Hosein Nikopour, Shilpa Talwar

Abstract: Connection management is an important problem for any wireless network to ensure smooth and well-balanced operation throughout. Traditional methods for connection management (specifically user-cell association) consider sub-optimal and greedy solutions such as connection of each user to a cell with maximum receive power. However, network performance can be improved by leveraging machine learning (… ▽ More Connection management is an important problem for any wireless network to ensure smooth and well-balanced operation throughout. Traditional methods for connection management (specifically user-cell association) consider sub-optimal and greedy solutions such as connection of each user to a cell with maximum receive power. However, network performance can be improved by leveraging machine learning (ML) and artificial intelligence (AI) based solutions. The next generation software defined 5G networks defined by the Open Radio Access Network (O-RAN) alliance facilitates the inclusion of ML/AI based solutions for various network problems. In this paper, we consider intelligent connection management based on the O-RAN network architecture to optimize user association and load balancing in the network. We formulate connection management as a combinatorial graph optimization problem. We propose a deep reinforcement learning (DRL) solution that uses the underlying graph to learn the weights of the graph neural networks (GNN) for optimal user-cell association. We consider three candidate objective functions: sum user throughput, cell coverage, and load balancing. Our results show up to 10% gain in throughput, 45-140% gain cell coverage, 20-45% gain in load balancing depending on network deployment configurations compared to baseline greedy techniques. △ Less

Submitted 20 October, 2021; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: paper accepted to the IEEE International Conference on Machine Learning and Applications (ICMLA 2021)

arXiv:1806.08777 [pdf, other]

Wireless Channel Dynamics and Robustness for Ultra-Reliable Low-Latency Communications

Authors: Vasuki Narasimha Swamy, Paul Rigge, Gireeja Ranade, Borivoje Nikolic, Anant Sahai

Abstract: Interactive, immersive and critical applications demand ultra-reliable low-latency communication (URLLC). To build wireless communication systems that can support these applications, understanding the characteristics of the wireless medium is paramount. Although wireless channel characteristics and dynamics have been extensively studied, it is important to revisit these concepts in the context of… ▽ More Interactive, immersive and critical applications demand ultra-reliable low-latency communication (URLLC). To build wireless communication systems that can support these applications, understanding the characteristics of the wireless medium is paramount. Although wireless channel characteristics and dynamics have been extensively studied, it is important to revisit these concepts in the context of the strict demands of low latency and ultra-reliability. In this paper, we bring a modeling approach from robust control to wireless communication -- the wireless channel characteristics are given a nominal model around which we allow for some quantified uncertainty. We propose certain key "directions" along which to bound model uncertainty that are relevant to URLLC. For the nominal model, we take an in-depth look at wireless channel characteristics such as spatial and temporal correlations based on Jakes' model. Contrary to what has been claimed in the literature, we find that standard Rayleigh fading processes are not bandlimited. This has significant implications on the predictability of channels. We also find that under reasonable conditions the spatial correlation of channels provide a fading distribution that is not too far off from an independent spatial fading model. Additionally, we look at the impact of these channel models on cooperative communication based systems. We find that while spatial-diversity-based techniques are necessary to combat the adverse effects of fading, time-diversity-based techniques are necessary to be robust against unmodeled errors. Robust URLLC systems need to operate with both an SNR margin and a time/repetition margin. △ Less

Submitted 22 June, 2018; originally announced June 2018.

Comments: Submitted to IEEE JSAC Special Issue on Ultra-Reliable Low-Latency Communications in Wireless Networks

arXiv:1803.05143 [pdf, other]

Network Coding for Real-time Wireless Communication for Automation

Authors: Vasuki Narasimha Swamy, Paul Rigge, Gireeja Ranade, Anant Sahai, Borivoje Nikolic

Abstract: Real-time applications require latencies on the order of a millisecond with very high reliabilities, paralleling the requirements for high-performance industrial control. Current wireless technologies like WiFi, Bluetooth, LTE, etc. are unable to meet these stringent latency and reliability requirements, forcing the use of wired systems. This paper introduces a wireless communication protocol base… ▽ More Real-time applications require latencies on the order of a millisecond with very high reliabilities, paralleling the requirements for high-performance industrial control. Current wireless technologies like WiFi, Bluetooth, LTE, etc. are unable to meet these stringent latency and reliability requirements, forcing the use of wired systems. This paper introduces a wireless communication protocol based on network coding that in conjunction with cooperative communication techniques builds the necessary diversity to achieve the target reliability. The proposed protocol is analyzed using a communication theoretic delay-limited-capacity framework and compared to proposed protocols without network coding. The results show that for larger network sizes or payloads employing network coding lowers the minimum SNR required to achieve the target reliability. For a scenario inspired by an industrial printing application with $30$ nodes in the control loop, aggregate throughput of $4.8$ Mb/s, $20$MHz of bandwidth and cycle time under $2$ ms, the protocol can robustly achieve a system probability of error better than $10^{-9}$ with a nominal SNR less than $2$ dB under ideal channel conditions. △ Less

Submitted 14 March, 2018; originally announced March 2018.

Comments: A preliminary version of this work appeared at IEEE WCNC 2016

arXiv:1701.01894 [pdf, other]

Modeling Actuation Constraints for IoT Applications

Authors: Bharathan Balaji, Brad Campbell, Amit Levy, Xiaozhou Li, Addison Mayberry, Nirupam Roy, Vasuki Narasimha Swamy, Longqi Yang, Victor Bahl, Ranveer Chandra, Ratul Mahajan

Abstract: Internet of Things (IoT) promises to bring ease of monitoring, better efficiency and innovative services across many domains with connected devices around us. With information from critical parts of infrastructure and powerful cloud-based data analytics, many applications can be developed to gain insights about IoT systems as well as transform their capabilities. Actuation applications form an ess… ▽ More Internet of Things (IoT) promises to bring ease of monitoring, better efficiency and innovative services across many domains with connected devices around us. With information from critical parts of infrastructure and powerful cloud-based data analytics, many applications can be developed to gain insights about IoT systems as well as transform their capabilities. Actuation applications form an essential part of these IoT systems, as they enable automation as well as fast low-level decision making. However, modern IoT systems are designed for data acquisition, and actuation applications are implemented in an ad-hoc manner. We identify modeling constraints in a systematic manner as indispensable to support actuation applications because constraints encompass high-level policies dictated by laws of physics, legal policies, user preferences. We explore data models for constraints inIoT system with the example of a home heating system and illustrate the challenges in enforcing these constraints in theIoT system architecture. △ Less

Submitted 7 January, 2017; originally announced January 2017.

Comments: Microsoft Research Student Summit - Internet of Things Working Group

arXiv:1609.02968 [pdf, other]

Real-time Cooperative Communication for Automation over Wireless

Authors: Vasuki Narasimha Swamy, Sahaana Suri, Paul Rigge, Matthew Weiner, Gireeja Ranade, Anant Sahai, Borivoje Nikolic

Abstract: High-performance industrial automation systems rely on tens of simultaneously active sensors and actuators and have stringent communication latency and reliability requirements. Current wireless technologies like WiFi, Bluetooth, and LTE are unable to meet these requirements, forcing the use of wired communication in industrial control systems. This paper introduces a wireless communication protoc… ▽ More High-performance industrial automation systems rely on tens of simultaneously active sensors and actuators and have stringent communication latency and reliability requirements. Current wireless technologies like WiFi, Bluetooth, and LTE are unable to meet these requirements, forcing the use of wired communication in industrial control systems. This paper introduces a wireless communication protocol that capitalizes on multiuser diversity and cooperative communication to achieve the ultra-reliability with a low-latency constraint. Our protocol is analyzed using the communication-theoretic delay-limited-capacity framework and compared to baseline schemes that primarily exploit frequency diversity. For a scenario inspired by an industrial printing application with thirty nodes in the control loop, 20B messages transmitted between pairs of nodes and a cycle time of $2$ ms, an idealized protocol can achieve a cycle failure probability (probability that any packet in a cycle is not successfully delivered) lower than $10^{-9}$ with nominal SNR below 5 dB in a 20MHz wide channel. △ Less

Submitted 23 January, 2017; v1 submitted 9 September, 2016; originally announced September 2016.

Comments: A preliminary version of this work appeared at IEEE International Conference on Communications 2015

arXiv:1505.05711 [pdf, ps, other]

doi 10.1088/0022-3727/48/47/475002

Resistance minimum and electrical conduction mechanism in polycrystalline CoFeB thin films

Authors: G. Venkat Swamy, P. K. Rout, Manju Singh, R. K. Rakshit

Abstract: The temperature dependent resistance $R$($T$) of polycrystalline ferromagnetic CoFeB thin films of varying thickness are analyzed considering various electrical scattering processes. We observe a resistance minimum in $R$($T$) curves below $\simeq$ 29 K, which can be explained as an effect of intergranular Coulomb interaction in a granular system. The structural and Coulomb interaction related sca… ▽ More The temperature dependent resistance $R$($T$) of polycrystalline ferromagnetic CoFeB thin films of varying thickness are analyzed considering various electrical scattering processes. We observe a resistance minimum in $R$($T$) curves below $\simeq$ 29 K, which can be explained as an effect of intergranular Coulomb interaction in a granular system. The structural and Coulomb interaction related scattering processes contribute more as the film thickness decreases implying the role of disorder and granularity. Although the magnetic contribution to the resistance is the weakest compared to these two, it is the only thickness independent process. On the contrary, the negative coefficient of resistance can be explained by electron interaction effect in disordered amorphous films. △ Less

Submitted 26 October, 2015; v1 submitted 21 May, 2015; originally announced May 2015.

Journal ref: J. Phys. D Appl. Phys. 48, 475002 (2015)

arXiv:1310.2026 [pdf, other]

doi 10.1109/TIT.2015.2466635

Low-Complexity Interactive Algorithms for Synchronization from Deletions, Insertions, and Substitutions

Authors: Ramji Venkataramanan, Vasuki Narasimha Swamy, Kannan Ramchandran

Abstract: Consider two remote nodes having binary sequences $X$ and $Y$, respectively. $Y$ is an edited version of ${X}$, where the editing involves random deletions, insertions, and substitutions, possibly in bursts. The goal is for the node with $Y$ to reconstruct $X$ with minimal exchange of information over a noiseless link. The communication is measured in terms of both the total number of bits exchang… ▽ More Consider two remote nodes having binary sequences $X$ and $Y$, respectively. $Y$ is an edited version of ${X}$, where the editing involves random deletions, insertions, and substitutions, possibly in bursts. The goal is for the node with $Y$ to reconstruct $X$ with minimal exchange of information over a noiseless link. The communication is measured in terms of both the total number of bits exchanged and the number of interactive rounds of communication. This paper focuses on the setting where the number of edits is $o(\tfrac{n}{\log n})$, where $n$ is the length of $X$. We first consider the case where the edits are a mixture of insertions and deletions (indels), and propose an interactive synchronization algorithm with near-optimal communication rate and average computational complexity of $O(n)$ arithmetic operations. The algorithm uses interaction to efficiently split the source sequence into substrings containing exactly one deletion or insertion. Each of these substrings is then synchronized using an optimal one-way synchronization code based on the single-deletion correcting channel codes of Varshamov and Tenengolts (VT codes). We then build on this synchronization algorithm in three different ways. First, it is modified to work with a single round of interaction. The reduction in the number of rounds comes at the expense of higher communication, which is quantified. Next, we present an extension to the practically important case where the insertions and deletions may occur in (potentially large) bursts. Finally, we show how to synchronize the sources to within a target Hamming distance. This feature can be used to differentiate between substitution and indel edits. In addition to theoretical performance bounds, we provide several validating simulation results for the proposed algorithms. △ Less

Submitted 12 September, 2015; v1 submitted 8 October, 2013; originally announced October 2013.

Journal ref: IEEE Transactions on Information Theory, vol. 61, no. 10, pp. 5670-5689, October 2015

arXiv:1305.7335 [pdf, ps, other]

doi 10.1063/1.4816811

Effect of Thermal Annealing on Boron Diffusion, Micro-structural, Electrical and Magnetic properties of Laser Ablated CoFeB Thin Films

Authors: G. Venkat Swamy, Himanshu Pandey, A. K. Srivastava, M. K. Dalai, K. K. Maurya, Rashmi, R. K. Rakshit

Abstract: We report on Boron diffusion and subsequent crystallization of Co$_{40}$Fe$_{40}$B$_{20}$ (CoFeB) thin films on SiO$_2$/Si(001) substrate using pulsed laser deposition. Secondary ion mass spectroscopy reveals Boron diffusion at the interface in both amorphous and crystalline phase of CoFeB. High-resolution transmission electron microscopy reveals a small fraction of nano-crystallites embedded in t… ▽ More We report on Boron diffusion and subsequent crystallization of Co$_{40}$Fe$_{40}$B$_{20}$ (CoFeB) thin films on SiO$_2$/Si(001) substrate using pulsed laser deposition. Secondary ion mass spectroscopy reveals Boron diffusion at the interface in both amorphous and crystalline phase of CoFeB. High-resolution transmission electron microscopy reveals a small fraction of nano-crystallites embedded in the amorphous matrix of CoFeB. However, annealing at 400$^\circ$C results in crystallization of CoFe with \textit{bcc} structure along (110) orientation. As-deposited films are non-metallic in nature with the coercivity (H$_c$) of 5Oe while the films annealed at 400$^\circ$C are metallic with a H$_c$ of 135Oe. △ Less

Submitted 31 May, 2013; originally announced May 2013.

Comments: 16 pages, 6 figures

Journal ref: AIP Advances 3, 072129 (2013)

arXiv:1210.3187 [pdf, ps, other]

An asymptotically optimal push-pull method for multicasting over a random network

Authors: Vasuki Narasimha Swamy, Srikrishna Bhashyam, Rajesh Sundaresan, Pramod Viswanath

Abstract: We consider allcast and multicast flow problems where either all of the nodes or only a subset of the nodes may be in session. Traffic from each node in the session has to be sent to every other node in the session. If the session does not consist of all the nodes, the remaining nodes act as relays. The nodes are connected by undirected links whose capacities are independent and identically distri… ▽ More We consider allcast and multicast flow problems where either all of the nodes or only a subset of the nodes may be in session. Traffic from each node in the session has to be sent to every other node in the session. If the session does not consist of all the nodes, the remaining nodes act as relays. The nodes are connected by undirected links whose capacities are independent and identically distributed random variables. We study the asymptotics of the capacity region (with network coding) in the limit of a large number of nodes, and show that the normalized sum rate converges to a constant almost surely. We then provide a decentralized push-pull algorithm that asymptotically achieves this normalized sum rate without network coding. △ Less

Submitted 8 February, 2013; v1 submitted 11 October, 2012; originally announced October 2012.

Comments: 13 pages, extended version of paper presented at the IEEE International Symposium on Information Theory (ISIT) 2012, minor revision to text to address review comments, to appear in IEEE Transactions in information theory

arXiv:1003.5435 [pdf]

doi 10.5121/ijngn.2010.2104

Image Compression and Watermarking scheme using Scalar Quantization

Authors: Kilari Veera Swamy, B. Chandra Mohan, Y. V. Bhaskar Reddy, S. Srinivas Kumar

Abstract: This paper presents a new compression technique and image watermarking algorithm based on Contourlet Transform (CT). For image compression, an energy based quantization is used. Scalar quantization is explored for image watermarking. Double filter bank structure is used in CT. The Laplacian Pyramid (LP) is used to capture the point discontinuities, and then followed by a Directional Filter Bank (D… ▽ More This paper presents a new compression technique and image watermarking algorithm based on Contourlet Transform (CT). For image compression, an energy based quantization is used. Scalar quantization is explored for image watermarking. Double filter bank structure is used in CT. The Laplacian Pyramid (LP) is used to capture the point discontinuities, and then followed by a Directional Filter Bank (DFB) to link point discontinuities. The coefficients of down sampled low pass version of LP decomposed image are re-ordered in a pre-determined manner and prediction algorithm is used to reduce entropy (bits/pixel). In addition, the coefficients of CT are quantized based on the energy in the particular band. The superiority of proposed algorithm to JPEG is observed in terms of reduced blocking artifacts. The results are also compared with wavelet transform (WT). Superiority of CT to WT is observed when the image contains more contours. The watermark image is embedded in the low pass image of contourlet decomposition. The watermark can be extracted with minimum error. In terms of PSNR, the visual quality of the watermarked image is exceptional. The proposed algorithm is robust to many image attacks and suitable for copyright protection applications. △ Less

Submitted 29 March, 2010; originally announced March 2010.

Comments: 11 Pages, IJNGN Journal 2010

Journal ref: International Journal of Next-Generation Networks 2.1 (2010) 37-47

arXiv:0907.1464 [pdf]

Cotunnite-structured titanium dioxide: the hardest known oxide

Authors: L. S. Dubrovinsky, N. A. Dubrovinskaia, V. Swamy, J. Muscat, N. M. Harrison, R. Ahuja, B. Holm

Abstract: Despite great technological importance and many investigations, a material with measured hardness comparable to that of diamond or cubic boron nitride has yet to be identified. Combined theoretical and experimental investigations led to the discovery of a new polymorph of titanium dioxide with titanium nine-coordinated to oxygen in the cotunnite (PbCl2) structure. Hardness measurements on the co… ▽ More Despite great technological importance and many investigations, a material with measured hardness comparable to that of diamond or cubic boron nitride has yet to be identified. Combined theoretical and experimental investigations led to the discovery of a new polymorph of titanium dioxide with titanium nine-coordinated to oxygen in the cotunnite (PbCl2) structure. Hardness measurements on the cotunnite-structured TiO2 synthesized at pressures above 60 GPa and temperatures above 1000 K reveal that this material is the hardest oxide yet discovered. Furthermore, it is one of the least compressible (with a measured bulk modulus of 431 GPa) and hardest (with a microhardness of 38 GPa) polycrystalline materials studied thus far. △ Less

Submitted 9 July, 2009; originally announced July 2009.

Comments: This is full version of the paper published as Brief Communications in Nature, 410, 653-654

arXiv:gr-qc/9405069 [pdf, ps, other]

doi 10.1007/BF02105076

Detection of Computer Generated Gravitational Waves in Numerical Cosmologies

Authors: B. K. Berger, D. Garfinkle, V. Swamy

Abstract: We propose to study the behavior of complicated numerical solutions to Einstein's equations for generic cosmologies by following the geodesic motion of a swarm of test particles. As an example, we consider a cylinder of test particles initially at rest in the plane symmetric Gowdy universe on $T^3 \times R$. For a circle of test particles in the symmetry plane, the geodesic equations predict evo… ▽ More We propose to study the behavior of complicated numerical solutions to Einstein's equations for generic cosmologies by following the geodesic motion of a swarm of test particles. As an example, we consider a cylinder of test particles initially at rest in the plane symmetric Gowdy universe on $T^3 \times R$. For a circle of test particles in the symmetry plane, the geodesic equations predict evolution of the circle into distortions and rotations of an ellipse as well as motion perpendicular to the plane. The evolutionary sequence of ellipses depends on the initial position of the circle of particles. We display snapshots of the evolution of the cylinder. △ Less

Submitted 27 May, 1994; originally announced May 1994.

Comments: 15 pages Plain TeX, 9 pages of figures available on request by FAX or mail

Journal ref: Gen.Rel.Grav. 27 (1995) 511-527

Showing 1–24 of 24 results for author: Swamy, V