Search | arXiv e-print repository

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

Authors: Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

Abstract: The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scal… ▽ More The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scalable and structured manner. Each data in our dataset is verified through three hierarchical stages: format checking, actual function executions, and semantic verification, ensuring its reliability and correctness. We demonstrate that models trained with our curated datasets, even with only 7B parameters, can achieve state-of-the-art performance on the Berkeley Function-Calling Benchmark, outperforming multiple GPT-4 models. Moreover, our 1B model achieves exceptional performance, surpassing GPT-3.5-Turbo and Claude-3 Haiku. We release a dataset containing 60,000 high-quality entries, aiming to advance the field of function-calling agent domains. The dataset is available on Huggingface: https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k and the project homepage: https://apigen-pipeline.github.io/ △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.10290 [pdf, other]

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

Authors: Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese

Abstract: The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understand… ▽ More The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understanding of quantization's impact on various task performances, including LLM tasks, LMM tasks, and, critically, trust and safety. There is a lack of adequate tools for systematically testing these models on mobile devices. To address these gaps, we introduce MobileAIBench, a comprehensive benchmarking framework for evaluating mobile-optimized LLMs and LMMs. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices. Our two-part open-source framework includes a library for running evaluations on desktops and an iOS app for on-device latency and hardware utilization measurements. Our thorough analysis aims to accelerate mobile AI research and deployment by providing insights into the performance and feasibility of deploying LLMs and LMMs on mobile platforms. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2402.15506 [pdf, other]

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

Authors: Jianguo Zhang, Tian Lan, Rithesh Murthy, Zhiwei Liu, Weiran Yao, Juntao Tan, Thai Hoang, Liangwei Yang, Yihao Feng, Zuxin Liu, Tulika Awalgaonkar, Juan Carlos Niebles, Silvio Savarese, Shelby Heinecke, Huan Wang, Caiming Xiong

Abstract: Autonomous agents powered by large language models (LLMs) have garnered significant research attention. However, fully harnessing the potential of LLMs for agent-based tasks presents inherent challenges due to the heterogeneous nature of diverse data sources featuring multi-turn trajectories. In this paper, we introduce \textbf{AgentOhana} as a comprehensive solution to address these challenges. \… ▽ More Autonomous agents powered by large language models (LLMs) have garnered significant research attention. However, fully harnessing the potential of LLMs for agent-based tasks presents inherent challenges due to the heterogeneous nature of diverse data sources featuring multi-turn trajectories. In this paper, we introduce \textbf{AgentOhana} as a comprehensive solution to address these challenges. \textit{AgentOhana} aggregates agent trajectories from distinct environments, spanning a wide array of scenarios. It meticulously standardizes and unifies these trajectories into a consistent format, streamlining the creation of a generic data loader optimized for agent training. Leveraging the data unification, our training pipeline maintains equilibrium across different data sources and preserves independent randomness across devices during dataset partitioning and model training. Additionally, we present \textbf{xLAM-v0.1}, a large action model tailored for AI agents, which demonstrates exceptional performance across various benchmarks. Begin the exploration at \url{https://github.com/SalesforceAIResearch/xLAM}. △ Less

Submitted 20 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: Add GitHub repo link at \url{https://github.com/SalesforceAIResearch/xLAM} and HuggingFace model link at \url{https://huggingface.co/Salesforce/xLAM-v0.1-r}

arXiv:2401.15006 [pdf, other]

Airavata: Introducing Hindi Instruction-tuned LLM

Authors: Jay Gala, Thanmay Jayakumar, Jaavid Aktar Husain, Aswanth Kumar M, Mohammed Safi Ur Rahman Khan, Diptesh Kanojia, Ratish Puduppully, Mitesh M. Khapra, Raj Dabre, Rudra Murthy, Anoop Kunchukuttan

Abstract: We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi. Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we also share the IndicInstruct dataset, which is a collection of diverse instruction-tuning datasets to enable further research for Indic LLMs. Additional… ▽ More We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi. Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we also share the IndicInstruct dataset, which is a collection of diverse instruction-tuning datasets to enable further research for Indic LLMs. Additionally, we present evaluation benchmarks and a framework for assessing LLM performance across tasks in Hindi. Currently, Airavata supports Hindi, but we plan to expand this to all 22 scheduled Indic languages. You can access all artifacts at https://ai4bharat.github.io/airavata. △ Less

Submitted 26 February, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Comments: Work in progress

arXiv:2401.08434 [pdf, other]

Distributed IRSs Always Benefit Every Mobile Operator

Authors: L. Yashvanth, Chandra R. Murthy

Abstract: We investigate the impact of multiple distributed intelligent reflecting surfaces (IRSs), which are deployed and optimized by a mobile operator (MO), on the performance of user equipments (UEs) served by other co-existing out-of-band (OOB) MOs that do not control the IRSs. We show that, under round-robin scheduling, in mmWave frequencies, the ergodic sum spectral efficiency (SE) of an OOB MO is mo… ▽ More We investigate the impact of multiple distributed intelligent reflecting surfaces (IRSs), which are deployed and optimized by a mobile operator (MO), on the performance of user equipments (UEs) served by other co-existing out-of-band (OOB) MOs that do not control the IRSs. We show that, under round-robin scheduling, in mmWave frequencies, the ergodic sum spectral efficiency (SE) of an OOB MO is monotonic in the total number of IRS elements with a pre-log factor that depends on the channel properties of the OOB UE. We further show that the maximum achievable SE of OOB MO scales log-linearly in IRS elements. Then, by specifying the minimum number of IRSs as a function of the channel parameters, we design a distributed IRS system in which an OOB MO almost surely obtains the maximum SE. Finally, we prove that the outage probability at an OOB UE decreases exponentially in the number of IRSs, even though they are randomly configured from the UE's viewpoint. We numerically verify our theory and conclude that distributed IRSs always help every MO, but the MO controlling the IRSs benefits the most. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 5 pages

arXiv:2401.07078 [pdf, other]

PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities

Authors: Settaluri Lakshmi Sravanthi, Meet Doshi, Tankala Pavan Kalyan, Rudra Murthy, Pushpak Bhattacharyya, Raj Dabre

Abstract: LLMs have demonstrated remarkable capability for understanding semantics, but they often struggle with understanding pragmatics. To demonstrate this fact, we release a Pragmatics Understanding Benchmark (PUB) dataset consisting of fourteen tasks in four pragmatics phenomena, namely, Implicature, Presupposition, Reference, and Deixis. We curated high-quality test sets for each task, consisting of M… ▽ More LLMs have demonstrated remarkable capability for understanding semantics, but they often struggle with understanding pragmatics. To demonstrate this fact, we release a Pragmatics Understanding Benchmark (PUB) dataset consisting of fourteen tasks in four pragmatics phenomena, namely, Implicature, Presupposition, Reference, and Deixis. We curated high-quality test sets for each task, consisting of Multiple Choice Question Answers (MCQA). PUB includes a total of 28k data points, 6.1k of which have been created by us, and the rest are adapted from existing datasets. We evaluated nine models varying in the number of parameters and type of training. Our study indicates that fine-tuning for instruction-following and chat significantly enhances the pragmatics capabilities of smaller language models. However, for larger models, the base versions perform comparably with their chat-adapted counterparts. Additionally, there is a noticeable performance gap between human capabilities and model capabilities. Furthermore, unlike the consistent performance of humans across various tasks, the models demonstrate variability in their proficiency, with performance levels fluctuating due to different hints and the complexities of tasks within the same dataset. Overall, the benchmark aims to provide a comprehensive evaluation of LLM's ability to handle real-world language tasks that require pragmatic reasoning. △ Less

Submitted 13 January, 2024; originally announced January 2024.

arXiv:2312.01364 [pdf, other]

Tradeoff of age-of-information and power under reliability constraint for short-packet communication with block-length adaptation

Authors: Sudarsanan A. K., Vineeth B. S., Chandra R. Murthy

Abstract: In applications such as remote estimation and monitoring, update packets are transmitted by power-constrained devices using short-packet codes over wireless networks. Therefore, networks need to be end-to-end optimized using information freshness metrics such as age of information under transmit power and reliability constraints to ensure support for such applications. For short-packet coding, mod… ▽ More In applications such as remote estimation and monitoring, update packets are transmitted by power-constrained devices using short-packet codes over wireless networks. Therefore, networks need to be end-to-end optimized using information freshness metrics such as age of information under transmit power and reliability constraints to ensure support for such applications. For short-packet coding, modelling and understanding the effect of block codeword length on transmit power and other performance metrics is important. To understand the above optimization for short-packet coding, we consider the optimal tradeoff problem between age of information and transmit power under reliability constraints for short packet point-to-point communication model with an exogenous packet generation process. In contrast to prior work, we consider scheduling policies that can possibly adapt the block-length or transmission time of short packet codes in order to achieve the optimal tradeoff. We characterize the tradeoff using a semi-Markov decision process formulation. We also obtain analytical upper bounds as well as numerical, analytical, and asymptotic lower bounds on the optimal tradeoff. We show that in certain regimes, such as high reliability and high packet generation rate, non-adaptive scheduling policies (fixed transmission time policies) are close-to-optimal. Furthermore, in a high-power or in a low-power regime, non-adaptive as well as state-independent randomized scheduling policies are order-optimal. These results are corroborated by numerical and simulation experiments. The tradeoff is then characterized for a wireless point-to-point channel with block fading as well as for other packet generation models (including an age-dependent packet generation model). △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2308.14033 [pdf, other]

doi 10.1109/TCOMM.2024.3412708

On the Impact of an IRS on the Out-of-Band Performance in Sub-6 GHz & mmWave Frequencies

Authors: L. Yashvanth, Chandra R. Murthy

Abstract: Intelligent reflecting surfaces (IRSs) were introduced to enhance the performance of wireless communication systems. However, from a service provider's viewpoint, a concern with the use of an IRS is its effect on out-of-band (OOB) quality of service. Specifically, if two operators, say X and Y, provide services in a given geographical area using non-overlap** frequency bands, and if operator X u… ▽ More Intelligent reflecting surfaces (IRSs) were introduced to enhance the performance of wireless communication systems. However, from a service provider's viewpoint, a concern with the use of an IRS is its effect on out-of-band (OOB) quality of service. Specifically, if two operators, say X and Y, provide services in a given geographical area using non-overlap** frequency bands, and if operator X uses an IRS to enhance the spectral efficiency (SE) of its users (UEs), does it degrade the performance of UEs served by operator Y? We answer this by analyzing the average and instantaneous performances of the OOB operator considering both sub-6 GHz and mmWave bands. Specifically, we derive the ergodic sum SE achieved by the operators under round-robin scheduling. We also derive the outage probability and analyze the change in the SNR caused by the IRS at an OOB UE using stochastic dominance theory. Surprisingly, even though the IRS is randomly configured from operator Y's point of view, the OOB operator still benefits from the presence of the IRS, witnessing a performance enhancement for free in both sub-6 GHz and mmWave bands. This is because the IRS introduces additional paths between the transmitter and receiver, increasing the overall signal power arriving at the UE and providing diversity benefits. Finally, we show that the use of opportunistic scheduling schemes can further enhance the benefit of the uncontrolled IRS at OOB UEs. We numerically illustrate our findings and conclude that an IRS is always beneficial to every operator, even when the IRS is deployed & controlled by only one operator. △ Less

Submitted 10 June, 2024; v1 submitted 27 August, 2023; originally announced August 2023.

Comments: Accepted for publication in IEEE Transactions on Communications

arXiv:2308.05960 [pdf, other]

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

Authors: Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

Abstract: The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the ability to resolve complex tasks by conditioning on past interactions such as observations and actions. Since the investigation of LAA is still very recent, limi… ▽ More The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the ability to resolve complex tasks by conditioning on past interactions such as observations and actions. Since the investigation of LAA is still very recent, limited explorations are available. Therefore, we provide a comprehensive comparison of LAA in terms of both agent architectures and LLM backbones. Additionally, we propose a new strategy to orchestrate multiple LAAs such that each labor LAA focuses on one type of action, \textit{i.e.} BOLAA, where a controller manages the communication among multiple agents. We conduct simulations on both decision-making and multi-step reasoning environments, which comprehensively justify the capacity of LAAs. Our performance results provide quantitative suggestions for designing LAA architectures and the optimal choice of LLMs, as well as the compatibility of both. We release our implementation code of LAAs to the public at \url{https://github.com/salesforce/BOLAA}. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: Preprint

arXiv:2308.02151 [pdf, other]

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Authors: Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, Jianguo Zhang, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

Abstract: Recent months have seen the emergence of a powerful new trend in which large language models (LLMs) are augmented to become autonomous language agents capable of performing objective oriented multi-step tasks on their own, rather than merely responding to queries from human users. Most existing language agents, however, are not optimized using environment-specific rewards. Although some agents ena… ▽ More Recent months have seen the emergence of a powerful new trend in which large language models (LLMs) are augmented to become autonomous language agents capable of performing objective oriented multi-step tasks on their own, rather than merely responding to queries from human users. Most existing language agents, however, are not optimized using environment-specific rewards. Although some agents enable iterative refinement through verbal feedback, they do not reason and plan in ways that are compatible with gradient-based learning from rewards. This paper introduces a principled framework for reinforcing large language agents by learning a retrospective model, which automatically tunes the language agent prompts from environment feedback through policy gradient. Specifically, our proposed agent architecture learns from rewards across multiple environments and tasks, for fine-tuning a pre-trained language model which refines the language agent prompt by summarizing the root cause of prior failed attempts and proposing action plans. Experimental results on various tasks demonstrate that the language agents improve over time and that our approach considerably outperforms baselines that do not properly leverage gradients from the environment. This demonstrates that using policy gradient optimization to improve language agents, for which we believe our work is one of the first, seems promising and can be applied to optimize other models in the agent architecture to enhance agent performances over time. △ Less

Submitted 5 May, 2024; v1 submitted 4 August, 2023; originally announced August 2023.

arXiv:2307.08962 [pdf, other]

REX: Rapid Exploration and eXploitation for AI Agents

Authors: Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

Abstract: In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer… ▽ More In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance. This approach has the advantage of enabling the utilization of offline behaviors from logs and allowing seamless integration with existing foundation models while it does not require any model fine-tuning. Through comparative analysis with existing methods such as Chain-of-Thoughts(CoT) and Reasoning viA Planning(RAP), REX-based methods demonstrate comparable performance and, in certain cases, even surpass the results achieved by these existing techniques. Notably, REX-based methods exhibit remarkable reductions in execution time, enhancing their practical applicability across a diverse set of scenarios. △ Less

Submitted 26 January, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

arXiv:2305.11790 [pdf, other]

Prompting with Pseudo-Code Instructions

Authors: Mayank Mishra, Prince Kumar, Riyaz Bhat, Rudra Murthy V, Danish Contractor, Srikanth Tamilselvam

Abstract: Prompting with natural language instructions has recently emerged as a popular method of harnessing the capabilities of large language models. Given the inherent ambiguity present in natural language, it is intuitive to consider the possible advantages of prompting with less ambiguous prompt styles, such as the use of pseudo-code. In this paper we explore if prompting via pseudo-code instruction… ▽ More Prompting with natural language instructions has recently emerged as a popular method of harnessing the capabilities of large language models. Given the inherent ambiguity present in natural language, it is intuitive to consider the possible advantages of prompting with less ambiguous prompt styles, such as the use of pseudo-code. In this paper we explore if prompting via pseudo-code instructions helps improve the performance of pre-trained language models. We manually create a dataset of pseudo-code prompts for 132 different tasks spanning classification, QA and generative language tasks, sourced from the Super-NaturalInstructions dataset. Using these prompts along with their counterparts in natural language, we study their performance on two LLM families - BLOOM and CodeGen. Our experiments show that using pseudo-code instructions leads to better results, with an average increase (absolute) of 7-16 points in F1 scores for classification tasks and an improvement (relative) of 12-38% in aggregate ROUGE-L scores across all tasks. We include detailed ablation studies which indicate that code comments, docstrings, and the structural clues encoded in pseudo-code all contribute towards the improvement in performance. To the best of our knowledge our work is the first to demonstrate how pseudo-code prompts can be helpful in improving the performance of pre-trained LMs. △ Less

Submitted 19 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: Published in EMNLP 2023 main track

arXiv:2305.06161 [pdf, other]

StarCoder: may the source be with you!

Authors: Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu , et al. (42 additional authors not shown)

Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle… ▽ More The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license. △ Less

Submitted 13 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

arXiv:2303.01191 [pdf, other]

Denoising-based UNMT is more robust to word-order divergence than MASS-based UNMT

Authors: Tamali Banerjee, Rudra Murthy V, Pushpak Bhattacharyya

Abstract: We aim to investigate whether UNMT approaches with self-supervised pre-training are robust to word-order divergence between language pairs. We achieve this by comparing two models pre-trained with the same self-supervised pre-training objective. The first model is trained on language pairs with different word-orders, and the second model is trained on the same language pairs with source language r… ▽ More We aim to investigate whether UNMT approaches with self-supervised pre-training are robust to word-order divergence between language pairs. We achieve this by comparing two models pre-trained with the same self-supervised pre-training objective. The first model is trained on language pairs with different word-orders, and the second model is trained on the same language pairs with source language re-ordered to match the word-order of the target language. Ideally, UNMT approaches which are robust to word-order divergence should exhibit no visible performance difference between the two configurations. In this paper, we investigate two such self-supervised pre-training based UNMT approaches, namely Masked Sequence-to-Sequence Pre-Training, (MASS) (which does not have shuffling noise) and Denoising AutoEncoder (DAE), (which has shuffling noise). We experiment with five English$\rightarrow$Indic language pairs, i.e., en-hi, en-bn, en-gu, en-kn, and en-ta) where word-order of the source language is SVO (Subject-Verb-Object), and the word-order of the target languages is SOV (Subject-Object-Verb). We observed that for these language pairs, DAE-based UNMT approach consistently outperforms MASS in terms of translation accuracies. Moreover, bridging the word-order gap using reordering improves the translation accuracy of MASS-based UNMT models, while it cannot improve the translation accuracy of DAE-based UNMT models. This observation indicates that DAE-based UNMT is more robust to word-order divergence than MASS-based UNMT. Word-shuffling noise in DAE approach could be the possible reason for the approach being robust to word-order divergence. △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2302.12616 [pdf, other]

Does an IRS Degrade Out-of-Band Performance?

Authors: L. Yashvanth, Chandra R. Murthy

Abstract: Intelligent reflecting surfaces (IRSs) were introduced to enhance the performance of wireless systems. However, from a cellular service provider's view, a concern with the use of an IRS is its effect on out-of-band (OOB) quality of service. Specifically, given two operators, say X and Y, providing services in a geographical area using non-overlap** frequency bands, if operator-X uses an IRS to o… ▽ More Intelligent reflecting surfaces (IRSs) were introduced to enhance the performance of wireless systems. However, from a cellular service provider's view, a concern with the use of an IRS is its effect on out-of-band (OOB) quality of service. Specifically, given two operators, say X and Y, providing services in a geographical area using non-overlap** frequency bands, if operator-X uses an IRS to optimally enhance the throughput of its users, does the IRS degrade the performance of operator-Y? We answer this by deriving the ergodic sum spectral efficiency (SE) of both operators under round-robin scheduling. We also derive the complementary cumulative distribution function of the change in effective channel at an OOB user with and without the IRS, which provides deeper insights into OOB performance. Surprisingly, we find that even though the IRS is randomly configured from operator-Y's view, the OOB operator still benefits from the IRS, witnessing a performance enhancement for free. This happens because the IRS introduces additional paths between the nodes, increasing the signal power at the receiver and providing diversity benefits. We verify our findings numerically and conclude that an IRS is beneficial to every operator, even when the IRS is deployed to optimally serve only one operator. △ Less

Submitted 30 June, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

Comments: Accepted for presentation in IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC) 2023

arXiv:2302.12489 [pdf, other]

Channel State Information Based User Censoring in Irregular Repetition Slotted Aloha

Authors: Chirag Ramesh Srivatsa, Chandra R. Murthy

Abstract: Irregular repetition slotted aloha (IRSA) is a massive random access protocol which can be used to serve a large number of users while achieving a packet loss rate (PLR) close to zero. However, if the number of users is too high, then the system is interference limited and the PLR is close to one. In this paper, we propose a variant of IRSA in the interference limited regime, namely Censored-IRSA… ▽ More Irregular repetition slotted aloha (IRSA) is a massive random access protocol which can be used to serve a large number of users while achieving a packet loss rate (PLR) close to zero. However, if the number of users is too high, then the system is interference limited and the PLR is close to one. In this paper, we propose a variant of IRSA in the interference limited regime, namely Censored-IRSA (C-IRSA), wherein users with poor channel states censor themselves from transmitting their packets. We theoretically analyze the throughput performance of C-IRSA via density evolution. Using this, we derive closed-form expressions for the optimal choice of the censor threshold which maximizes the throughput while achieving zero PLR among uncensored users. Through extensive numerical simulations, we show that C-IRSA can achieve a 4$\times$ improvement in the peak throughput compared to conventional IRSA. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: Accepted at IEEE ICC 2023

arXiv:2301.01015 [pdf, other]

Semi-Structured Object Sequence Encoders

Authors: Rudra Murthy V, Riyaz Bhat, Chulaka Gunasekara, Siva Sankalp Patel, Hui Wan, Tejas Indulal Dhamecha, Danish Contractor, Marina Danilevsky

Abstract: In this paper we explore the task of modeling semi-structured object sequences; in particular, we focus our attention on the problem of develo** a structure-aware input representation for such sequences. Examples of such data include user activity on websites, machine logs, and many others. This type of data is often represented as a sequence of sets of key-value pairs over time and can present… ▽ More In this paper we explore the task of modeling semi-structured object sequences; in particular, we focus our attention on the problem of develo** a structure-aware input representation for such sequences. Examples of such data include user activity on websites, machine logs, and many others. This type of data is often represented as a sequence of sets of key-value pairs over time and can present modeling challenges due to an ever-increasing sequence length. We propose a two-part approach, which first considers each key independently and encodes a representation of its values over time; we then self-attend over these value-aware key representations to accomplish a downstream task. This allows us to operate on longer object sequences than existing methods. We introduce a novel shared-attention-head architecture between the two modules and present an innovative training schedule that interleaves the training of both modules with shared weights for some attention heads. Our experiments on multiple prediction tasks using real-world data demonstrate that our approach outperforms a unified network with hierarchical encoding, as well as other methods including a record-centric representation and a flattened representation of the sequence. △ Less

Submitted 22 May, 2023; v1 submitted 3 January, 2023; originally announced January 2023.

arXiv:2212.10168 [pdf, other]

Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages

Authors: Arnav Mhaske, Harshit Kedia, Sumanth Doddapaneni, Mitesh M. Khapra, Pratyush Kumar, Rudra Murthy V, Anoop Kunchukuttan

Abstract: We present, Naamapadam, the largest publicly available Named Entity Recognition (NER) dataset for the 11 major Indian languages from two language families. The dataset contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location, and, Organization) for 9 out of the 11 languages. The training dataset has been automaticall… ▽ More We present, Naamapadam, the largest publicly available Named Entity Recognition (NER) dataset for the 11 major Indian languages from two language families. The dataset contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location, and, Organization) for 9 out of the 11 languages. The training dataset has been automatically created from the Samanantar parallel corpus by projecting automatically tagged entities from an English sentence to the corresponding Indian language translation. We also create manually annotated testsets for 9 languages. We demonstrate the utility of the obtained dataset on the Naamapadam-test dataset. We also release IndicNER, a multilingual IndicBERT model fine-tuned on Naamapadam training set. IndicNER achieves an F1 score of more than $80$ for $7$ out of $9$ test languages. The dataset and models are available under open-source licences at https://ai4bharat.iitm.ac.in/naamapadam. △ Less

Submitted 28 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: ACL 2023

arXiv:2209.02777 [pdf, other]

Impact of Mobility on Downlink Cell-Free Massive MIMO Systems

Authors: Abhinav Anand, Chandra R. Murthy, Ribhu Chopra

Abstract: In this paper, we analyze the achievable downlink spectral efficiency of cell-free massive multiple input multiple output (CF-mMIMO) systems, accounting for the effects of channel aging (caused by user mobility) and pilot contamination. We consider two cases, one where user equipments (UEs) rely on downlink pilots beamformed by the access points (APs) to estimate downlink channel, and another wher… ▽ More In this paper, we analyze the achievable downlink spectral efficiency of cell-free massive multiple input multiple output (CF-mMIMO) systems, accounting for the effects of channel aging (caused by user mobility) and pilot contamination. We consider two cases, one where user equipments (UEs) rely on downlink pilots beamformed by the access points (APs) to estimate downlink channel, and another where UEs utilize statistical channel state information (CSI) for data decoding. For comparison, we also consider cellular mMIMO and derive its achievable spectral efficiency with channel aging and pilot contamination in the above two cases. Our results show that, in CF-mMIMO, downlink training is preferable over statistical CSI when the length of the data sequence is chosen optimally to maximize the spectral efficiency. In cellular mMIMO, however, either one of the two schemes may be better depending on whether user fairness or sum spectral efficiency is prioritized. Furthermore, the CF-mMIMO system generally outperforms cellular mMIMO even after accounting for the effects of channel aging and pilot contamination. Through numerical results, we illustrate the effect of various system parameters such as the maximum user velocity, uplink/downlink pilot lengths, data duration, network densification, and provide interesting insights into the key differences between cell-free and cellular mMIMO systems. △ Less

Submitted 6 September, 2022; originally announced September 2022.

arXiv:2205.07026 [pdf, other]

doi 10.1109/SPAWC51304.2022.9833941

Performance Analysis of Irregular Repetition Slotted Aloha with Multi-Cell Interference

Authors: Chirag Ramesh Srivatsa, Chandra R. Murthy

Abstract: Irregular repetition slotted aloha (IRSA) is a massive random access protocol in which users transmit several replicas of their packet over a frame to a base station. Existing studies have analyzed IRSA in the single-cell (SC) setup, which does not extend to the more practically relevant multi-cell (MC) setup due to the inter-cell interference. In this work, we analyze MC IRSA, accounting for pilo… ▽ More Irregular repetition slotted aloha (IRSA) is a massive random access protocol in which users transmit several replicas of their packet over a frame to a base station. Existing studies have analyzed IRSA in the single-cell (SC) setup, which does not extend to the more practically relevant multi-cell (MC) setup due to the inter-cell interference. In this work, we analyze MC IRSA, accounting for pilot contamination and multiuser interference. Via numerical simulations, we illustrate that, in practical settings, MC IRSA can have a drastic loss of throughput, up to $70\%$, compared to SC IRSA. Further, MC IRSA requires a significantly higher training length (about 4-5x compared to SC IRSA), in order to support the same user density and achieve the same throughput. We also provide insights into the impact of the pilot length, number of antennas, and signal to noise ratio on the performance of MC IRSA. △ Less

Submitted 28 May, 2022; v1 submitted 14 May, 2022; originally announced May 2022.

Comments: Accepted at IEEE SPAWC 2022

Journal ref: IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (2022), 1-5

arXiv:2204.13743 [pdf, other]

HiNER: A Large Hindi Named Entity Recognition Dataset

Authors: Rudra Murthy, Pallab Bhattacharjee, Rahul Sharnagat, Jyotsana Khatri, Diptesh Kanojia, Pushpak Bhattacharyya

Abstract: Named Entity Recognition (NER) is a foundational NLP task that aims to provide class labels like Person, Location, Organisation, Time, and Number to words in free text. Named Entities can also be multi-word expressions where the additional I-O-B annotation information helps label them during the NER annotation process. While English and European languages have considerable annotated data for the N… ▽ More Named Entity Recognition (NER) is a foundational NLP task that aims to provide class labels like Person, Location, Organisation, Time, and Number to words in free text. Named Entities can also be multi-word expressions where the additional I-O-B annotation information helps label them during the NER annotation process. While English and European languages have considerable annotated data for the NER task, Indian languages lack on that front -- both in terms of quantity and following annotation standards. This paper releases a significantly sized standard-abiding Hindi NER dataset containing 109,146 sentences and 2,220,856 tokens, annotated with 11 tags. We discuss the dataset statistics in all their essential detail and provide an in-depth analysis of the NER tag-set used with our data. The statistics of tag-set in our dataset show a healthy per-tag distribution, especially for prominent classes like Person, Location and Organisation. Since the proof of resource-effectiveness is in building models with the resource and testing the model on benchmark data and against the leader-board entries in shared tasks, we do the same with the aforesaid data. We use different language models to perform the sequence labelling task for NER and show the efficacy of our data by performing a comparative evaluation with models trained on another dataset available for the Hindi NER task. Our dataset helps achieve a weighted F1 score of 88.78 with all the tags and 92.22 when we collapse the tag-set, as discussed in the paper. To the best of our knowledge, no available dataset meets the standards of volume (amount) and variability (diversity), as far as Hindi NER is concerned. We fill this gap through this work, which we hope will significantly help NLP for Hindi. We release this dataset with our code and models at https://github.com/cfiltnlp/HiNER △ Less

Submitted 28 April, 2022; originally announced April 2022.

Comments: Accepted at LREC 2022, 8 pages

arXiv:2203.06313 [pdf, other]

doi 10.1109/TSP.2023.3265887

Performance Analysis of Intelligent Reflecting Surface Assisted Opportunistic Communications

Authors: L. Yashvanth, Chandra R. Murthy

Abstract: Intelligent reflecting surfaces (IRSs) are a promising technology for enhancing coverage and spectral efficiency, both in the sub-6 GHz and the millimeter wave (mmWave) bands. Existing approaches to leverage the benefits of IRS involve the use of a resource-intensive channel estimation step followed by a computationally expensive algorithm to optimize the reflection coefficients at the IRS. In thi… ▽ More Intelligent reflecting surfaces (IRSs) are a promising technology for enhancing coverage and spectral efficiency, both in the sub-6 GHz and the millimeter wave (mmWave) bands. Existing approaches to leverage the benefits of IRS involve the use of a resource-intensive channel estimation step followed by a computationally expensive algorithm to optimize the reflection coefficients at the IRS. In this work, focusing on the sub-6 GHz band of communications, we present and analyze several alternative schemes, where the phase configuration of the IRS is randomized and multi-user diversity is exploited to opportunistically select the best user at each point in time for data transmission. We show that the throughput of an IRS assisted opportunistic communication (OC) system asymptotically converges to the optimal beamforming-based throughput under fair allocation of resources, as the number of users gets large. We also introduce schemes that enhance the rate of convergence of the OC rate to the beamforming rate with the number of users. For all the proposed schemes, we derive the scaling law of the throughput in terms of the system parameters, as the number of users gets large. Following this, we extend the setup to wideband channels via an orthogonal frequency division multiplexing (OFDM) system and discuss two OC schemes in an IRS assisted setting that clearly elucidate the superior performance that IRS aided OC systems can offer over conventional systems, at very low implementation cost and complexity. △ Less

Submitted 24 October, 2022; v1 submitted 11 March, 2022; originally announced March 2022.

Comments: 17 pages, 9 figures

Journal ref: IEEE Transactions on Signal Processing, vol. 71, pp. 2056-2070, 2023

arXiv:2112.07242 [pdf, other]

doi 10.1109/TSP.2022.3186539

On the Impact of Channel Estimation on the Design and Analysis of IRSA based Systems

Authors: Chirag Ramesh Srivatsa, Chandra R. Murthy

Abstract: Irregular repetition slotted aloha (IRSA) is a distributed grant-free random access protocol where users transmit multiple replicas of their packets to a base station (BS). The BS recovers the packets using successive interference cancellation. In this paper, we first derive channel estimates for IRSA, exploiting the sparsity structure of IRSA transmissions, when non-orthogonal pilots are employed… ▽ More Irregular repetition slotted aloha (IRSA) is a distributed grant-free random access protocol where users transmit multiple replicas of their packets to a base station (BS). The BS recovers the packets using successive interference cancellation. In this paper, we first derive channel estimates for IRSA, exploiting the sparsity structure of IRSA transmissions, when non-orthogonal pilots are employed across users to facilitate channel estimation at the BS. Allowing for the use of non-orthogonal pilots is important, as the length of orthogonal pilots scales linearly with the total number of devices, leading to prohibitive overhead as the number of devices increases. Next, we present a novel analysis of the throughput of IRSA under practical channel estimation errors, with the use of multiple antennas at the BS. Finally, we theoretically characterize the asymptotic throughput performance of IRSA using a density evolution based analysis. Simulation results underline the importance of accounting for channel estimation errors in analyzing IRSA, which can even lead to 70% loss in performance in severely interference-limited regimes. We also provide novel insights on the effect of parameters such as pilot length, SNR, number of antennas at the BS, etc, on the system throughput. △ Less

Submitted 23 June, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: Accepted at IEEE Transactions on Signal Processing, June 2022

Journal ref: IEEE Transactions on Signal Processing, Volume 70, June 2022, 4186-4200

arXiv:2111.06140 [pdf, ps, other]

doi 10.1109/TSP.2022.3185891

User Activity Detection for Irregular Repetition Slotted Aloha based MMTC

Authors: Chirag Ramesh Srivatsa, Chandra R. Murthy

Abstract: Irregular repetition slotted aloha (IRSA) is a grant-free random access protocol for massive machine-type communications, where a large number of users sporadically send their data packets to a base station (BS). IRSA is a completely distributed multiple access protocol: in any given frame, a small subset of the users, i.e., the active users, transmit replicas of their packet in randomly selected… ▽ More Irregular repetition slotted aloha (IRSA) is a grant-free random access protocol for massive machine-type communications, where a large number of users sporadically send their data packets to a base station (BS). IRSA is a completely distributed multiple access protocol: in any given frame, a small subset of the users, i.e., the active users, transmit replicas of their packet in randomly selected resource elements (REs). The first step in the decoding process at the BS is to detect which users are active in each frame. To this end, a novel Bayesian user activity detection (UAD) algorithm is developed, which exploits both the sparsity in user activity as well as the underlying structure of IRSA-based transmissions. Next, the Cramer-Rao bound (CRB) on the mean squared error in channel estimation is derived. It is empirically shown that the channel estimates obtained as a by-product of the proposed UAD algorithm achieves the CRB. Then, the signal to interference plus noise ratio achieved by the users is analyzed, accounting for UAD, channel estimation errors, and pilot contamination. The impact of these non-idealities on the throughput of IRSA is illustrated via Monte Carlo simulations. For example, in a system with 1500 users and 10% of the users being active per frame, a pilot length of as low as 20 symbols is sufficient for accurate user activity detection. In contrast, using classical compressed sensing approaches for UAD would require a pilot length of about 346 symbols. The results also reveal crucial insights into dependence of UAD errors and throughput on parameters such as the length of the pilot sequence, the number of antennas at the BS, the number of users, and the signal to noise ratio. △ Less

Submitted 23 June, 2022; v1 submitted 11 November, 2021; originally announced November 2021.

Comments: Accepted at IEEE Transactions on Signal Processing, June 2022

Journal ref: IEEE Transactions on Signal Processing, Volume 70, June 2022, 3616-3631

arXiv:2111.04975 [pdf, other]

Evaluation Of Orthogonal Chirp Division Multiplexing For Automotive Integrated Sensing And Communications

Authors: Sangeeta Bhattacharjee, Kumar Vijay Mishra, Ramesh Annavajjala, Chandra R. Murthy

Abstract: We consider a bistatic vehicular integrated sensing and communications (ISAC) system that employs the recently proposed orthogonal chirp division multiplexing (OCDM) multicarrier waveform. As a stand-alone communications waveform, OCDM has been shown to be robust against the interference in time-frequency selective channels. In a bistatic ISAC, we exploit this property to develop efficient receive… ▽ More We consider a bistatic vehicular integrated sensing and communications (ISAC) system that employs the recently proposed orthogonal chirp division multiplexing (OCDM) multicarrier waveform. As a stand-alone communications waveform, OCDM has been shown to be robust against the interference in time-frequency selective channels. In a bistatic ISAC, we exploit this property to develop efficient receive processing algorithms that achieve high target resolution as well as high communications rate. We derive statistical bounds for our proposed Sequential symbol decoding and radar parameter estimation (SUNDAE) algorithm and compare its competitive performance with other multicarrier waveforms through numerical experiments. △ Less

Submitted 9 November, 2021; originally announced November 2021.

Comments: Submitted to ICASSP 2022

arXiv:2110.09968 [pdf, ps, other]

Can Dynamic TDD Enabled Half-Duplex Cell-Free Massive MIMO Outperform Full-Duplex Cellular Massive MIMO?

Authors: Anubhab Chowdhury, Ribhu Chopra, Chandra R. Murthy

Abstract: We consider a dynamic time division duplex (DTDD) enabled cell-free massive multiple-input multiple-output (CF-mMIMO) system, where each half-duplex (HD) access point (AP) is scheduled to operate in the uplink (UL) or downlink (DL) mode based on the data demands of the user equipments (UEs), with the goal of maximizing the sum UL-DL spectral efficiency (SE). We develop a new, low complexity, greed… ▽ More We consider a dynamic time division duplex (DTDD) enabled cell-free massive multiple-input multiple-output (CF-mMIMO) system, where each half-duplex (HD) access point (AP) is scheduled to operate in the uplink (UL) or downlink (DL) mode based on the data demands of the user equipments (UEs), with the goal of maximizing the sum UL-DL spectral efficiency (SE). We develop a new, low complexity, greedy algorithm for the combinatorial AP scheduling problem, with an optimality guarantee theoretically established via showing that a lower bound of the sum UL-DL SE is sub-modular. We also consider pilot sequence reuse among the UEs to limit the channel estimation overhead. In CF systems, all the APs estimate the channel from every UE, making pilot allocation problem different from the cellular case. We develop a novel algorithm that iteratively minimizes the maximum pilot contamination across the UEs. We compare the performance of our solutions, both theoretically and via simulations, against a full duplex (FD) multi-cell mMIMO system. Our results show that, due to the joint processing of the signals at the central processing unit, CF-mMIMO with dynamic HD AP-scheduling significantly outperforms cellular FD-mMIMO in terms of the sum SE and 90% likely SE. Thus, DTDD enabled HD CF-mMIMO is a promising alternative to cellular FD-mMIMO, without the cost of hardware for self-interference suppression. △ Less

Submitted 21 May, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

Comments: Accepted, IEEE Transactions on Communications

Journal ref: IEEE Transactions on Communications, May, 2022

arXiv:2109.10534 [pdf, other]

Role of Language Relatedness in Multilingual Fine-tuning of Language Models: A Case Study in Indo-Aryan Languages

Authors: Tejas Indulal Dhamecha, Rudra Murthy V, Samarth Bharadwaj, Karthik Sankaranarayanan, Pushpak Bhattacharyya

Abstract: We explore the impact of leveraging the relatedness of languages that belong to the same family in NLP models using multilingual fine-tuning. We hypothesize and validate that multilingual fine-tuning of pre-trained language models can yield better performance on downstream NLP applications, compared to models fine-tuned on individual languages. A first of its kind detailed study is presented to tr… ▽ More We explore the impact of leveraging the relatedness of languages that belong to the same family in NLP models using multilingual fine-tuning. We hypothesize and validate that multilingual fine-tuning of pre-trained language models can yield better performance on downstream NLP applications, compared to models fine-tuned on individual languages. A first of its kind detailed study is presented to track performance change as languages are added to a base language in a graded and greedy (in the sense of best boost of performance) manner; which reveals that careful selection of subset of related languages can significantly improve performance than utilizing all related languages. The Indo-Aryan (IA) language family is chosen for the study, the exact languages being Bengali, Gujarati, Hindi, Marathi, Oriya, Punjabi and Urdu. The script barrier is crossed by simple rule-based transliteration of the text of all languages to Devanagari. Experiments are performed on mBERT, IndicBERT, MuRIL and two RoBERTa-based LMs, the last two being pre-trained by us. Low resource languages, such as Oriya and Punjabi, are found to be the largest beneficiaries of multilingual fine-tuning. Textual Entailment, Entity Classification, Section Title Prediction, tasks of IndicGLUE and POS tagging form our test bed. Compared to monolingual fine tuning we get relative performance improvement of up to 150% in the downstream tasks. The surprise take-away is that for any language there is a particular combination of other languages which yields the best performance, and any additional language is in fact detrimental. △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: Accepted in EMNLP 2021

arXiv:2107.00618 [pdf, other]

doi 10.1109/TNSM.2021.3091053

Resilient and Latency-aware Orchestration of Network Slices Using Multi-connectivity in MEC-enabled 5G Networks

Authors: Prabhu Kaliyammal Thiruvasagam, Abhishek Chakraborty, C Siva Ram Murthy

Abstract: Network slicing (NS) and multi-access edge computing (MEC) are new paradigms which play key roles in 5G and beyond networks. NS allows network operators (NOs) to divide the available network resources into multiple logical NSs for providing dedicated virtual networks tailored to the specific service/business requirements. MEC enables NOs to provide diverse ultra-low latency services for supporting… ▽ More Network slicing (NS) and multi-access edge computing (MEC) are new paradigms which play key roles in 5G and beyond networks. NS allows network operators (NOs) to divide the available network resources into multiple logical NSs for providing dedicated virtual networks tailored to the specific service/business requirements. MEC enables NOs to provide diverse ultra-low latency services for supporting the needs of different industry verticals by moving computing facilities to the network edge. NS can be constructed by instantiating a set of virtual network functions (VNFs) on top of MEC cloud servers for provisioning diverse latency-sensitive communication services (e.g., autonomous driving and augmented reality) on demand at a lesser cost and time. However, VNFs, MEC cloud servers, and communication links are subject to failures due to software bugs, misconfiguration, overloading, hardware faults, cyber attacks, power outage, and natural/man-made disaster. Failure of a critical network component disrupts services abruptly and leads to users' dissatisfaction, which may result in revenue loss for the NOs. In this paper, we present a novel approach based on multi-connectivity in 5G networks to tackle this problem and our proposed approach is resilient against i) failure of VNFs, ii) failure of local servers within MEC, iii) failure of communication links, and iv) failure of an entire MEC cloud facility in regional level. To this end, we formulate the problem as a binary integer programming (BIP) model in order to optimally deploy NSs with the minimum cost, and prove it is NP-hard. To overcome time complexity, we propose an efficient genetic algorithm based heuristic to obtain near-optimal solution in polynomial time. By extensive simulations, we show that our proposed approach not only reduces resource wastage, but also improves throughput while providing high resiliency against failures. △ Less

Submitted 1 July, 2021; originally announced July 2021.

arXiv:2106.09849 [pdf, other]

Latency-aware and Survivable Map** of VNFs in 5G Network Edge Cloud

Authors: Prabhu Kaliyammal Thiruvasagam, Abhishek Chakraborty, C. Siva Ram Murthy

Abstract: Network Functions Virtualization (NFV) and Multi-access Edge Computing (MEC) play crucial roles in 5G networks for dynamically provisioning diverse communication services with heterogeneous service requirements. In particular, while NFV improves flexibility and scalability by softwarizing physical network functions as Virtual Network Functions (VNFs), MEC enables to provide delay-sensitive/time-cr… ▽ More Network Functions Virtualization (NFV) and Multi-access Edge Computing (MEC) play crucial roles in 5G networks for dynamically provisioning diverse communication services with heterogeneous service requirements. In particular, while NFV improves flexibility and scalability by softwarizing physical network functions as Virtual Network Functions (VNFs), MEC enables to provide delay-sensitive/time-critical services by moving computing facilities to the network edge. However, these new paradigms introduce challenges in terms of latency, availability, and resource allocation. In this paper, we first explore MEC cloud facility location selection and then latency-aware placement of VNFs in different selected locations of NFV enabled MEC cloud facilities in order to meet the ultra-low latency requirements of different applications (e.g., Tactile Internet, virtual reality, and mission-critical applications). Furthermore, we also aim to guarantee the survivability of VNFs and an edge server against failures in resource limited MEC cloud facility due to software bugs, configuration faults, etc. To this end, we formulate the problem of latency-aware and survivable map** of VNFs in different MEC cloud facilities as an Integer Linear Programming (ILP) to minimize the overall service provisioning cost, and show that the problem is NP-hard. Owing to the high computational complexity of solving the ILP, we propose a simulated annealing based heuristic algorithm to obtain near-optimal solution in polynomial time. With extensive simulations, we show the effectiveness of our proposed solution in a real-world network topology, which performs close to the optimal solution. △ Less

Submitted 17 June, 2021; originally announced June 2021.

arXiv:2106.04995 [pdf, other]

Crosslingual Embeddings are Essential in UNMT for Distant Languages: An English to IndoAryan Case Study

Authors: Tamali Banerjee, Rudra Murthy V, Pushpak Bhattacharyya

Abstract: Recent advances in Unsupervised Neural Machine Translation (UNMT) have minimized the gap between supervised and unsupervised machine translation performance for closely related language pairs. However, the situation is very different for distant language pairs. Lack of lexical overlap and low syntactic similarities such as between English and Indo-Aryan languages leads to poor translation quality… ▽ More Recent advances in Unsupervised Neural Machine Translation (UNMT) have minimized the gap between supervised and unsupervised machine translation performance for closely related language pairs. However, the situation is very different for distant language pairs. Lack of lexical overlap and low syntactic similarities such as between English and Indo-Aryan languages leads to poor translation quality in existing UNMT systems. In this paper, we show that initializing the embedding layer of UNMT models with cross-lingual embeddings shows significant improvements in BLEU score over existing approaches with embeddings randomly initialized. Further, static embeddings (freezing the embedding layer weights) lead to better gains compared to updating the embedding layer weights during training (non-static). We experimented using Masked Sequence to Sequence (MASS) and Denoising Autoencoder (DAE) UNMT approaches for three distant language pairs. The proposed cross-lingual embedding initialization yields BLEU score improvement of as much as ten times over the baseline for English-Hindi, English-Bengali, and English-Gujarati. Our analysis shows the importance of cross-lingual embedding, comparisons between approaches, and the scope of improvements in these systems. △ Less

Submitted 9 June, 2021; originally announced June 2021.

arXiv:2105.09855 [pdf, ps, other]

doi 10.1109/TSP.2022.3169957

Multiple Support Recovery Using Very Few Measurements Per Sample

Authors: Lekshmi Ramesh, Chandra R. Murthy, Himanshu Tyagi

Abstract: In the problem of multiple support recovery, we are given access to linear measurements of multiple sparse samples in $\mathbb{R}^{d}$. These samples can be partitioned into $\ell$ groups, with samples having the same support belonging to the same group. For a given budget of $m$ measurements per sample, the goal is to recover the $\ell$ underlying supports, in the absence of the knowledge of grou… ▽ More In the problem of multiple support recovery, we are given access to linear measurements of multiple sparse samples in $\mathbb{R}^{d}$. These samples can be partitioned into $\ell$ groups, with samples having the same support belonging to the same group. For a given budget of $m$ measurements per sample, the goal is to recover the $\ell$ underlying supports, in the absence of the knowledge of group labels. We study this problem with a focus on the measurement-constrained regime where $m$ is smaller than the support size $k$ of each sample. We design a two-step procedure that estimates the union of the underlying supports first, and then uses a spectral algorithm to estimate the individual supports. Our proposed estimator can recover the supports with $m<k$ measurements per sample, from $\tilde{O}(k^{4}\ell^{4}/m^{4})$ samples. Our guarantees hold for a general, generative model assumption on the samples and measurement matrices. We also provide results from experiments conducted on synthetic data and on the MNIST dataset. △ Less

Submitted 20 May, 2021; originally announced May 2021.

arXiv:2104.10404 [pdf, ps, other]

Uplink Performance Analysis of Cell-Free mMIMO Systems under Channel Aging

Authors: Ribhu Chopra, Chandra R. Murthy, Anastasios K. Papazafeiropoulos

Abstract: In this paper, we investigate the impact of channel aging on the uplink performance of a cell-free~(CF) massive multiple-input multiple-output (mMIMO) system with a minimum mean squared error (MMSE) receiver. To this end, we present a new model for the temporal evolution of the channel, which allows the channel to age at different rates at different access points (APs). Under this setting, we deri… ▽ More In this paper, we investigate the impact of channel aging on the uplink performance of a cell-free~(CF) massive multiple-input multiple-output (mMIMO) system with a minimum mean squared error (MMSE) receiver. To this end, we present a new model for the temporal evolution of the channel, which allows the channel to age at different rates at different access points (APs). Under this setting, we derive the deterministic equivalent of the per-user achievable signal-to-interference-plus-noise ratio (SINR). In addition to validating the theoretical expressions, our simulation results reveal that, {at low user mobilities,} the SINR of CF-mMIMO is nearly 5 dB higher than its cellular counterpart with the same number of antennas, and about 8 dB higher than that of an equivalent small-cell network with the same number of APs. {On the other hand, at very high user velocities, and when the channel between the UEs the different APs age at same rate, the relative impact of aging is higher for CF-mMIMO compared to cellular mMIMO. However, when the channel ages at the different APs with different rates, the effect of aging on CF-mMIMO is marginally mitigated, especially for larger frame durations. △ Less

Submitted 21 April, 2021; originally announced April 2021.

Comments: 5 pages, 3 figures, Accepted in IEEE Communications Letters

arXiv:2102.11258 [pdf, other]

Cognitively Aided Zero-Shot Automatic Essay Grading

Authors: Sandeep Mathias, Rudra Murthy, Diptesh Kanojia, Pushpak Bhattacharyya

Abstract: Automatic essay grading (AEG) is a process in which machines assign a grade to an essay written in response to a topic, called the prompt. Zero-shot AEG is when we train a system to grade essays written to a new prompt which was not present in our training data. In this paper, we describe a solution to the problem of zero-shot automatic essay grading, using cognitive information, in the form of ga… ▽ More Automatic essay grading (AEG) is a process in which machines assign a grade to an essay written in response to a topic, called the prompt. Zero-shot AEG is when we train a system to grade essays written to a new prompt which was not present in our training data. In this paper, we describe a solution to the problem of zero-shot automatic essay grading, using cognitive information, in the form of gaze behaviour. Our experiments show that using gaze behaviour helps in improving the performance of AEG systems, especially when we provide a new essay written in response to a new prompt for scoring, by an average of almost 5 percentage points of QWK. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: This paper was accepted for publication at ICON 2020: The 17th International Conference on Natural Language Processing, on December 20, 2021

arXiv:2102.00235 [pdf, other]

Phase Transitions for Support Recovery from Gaussian Linear Measurements

Authors: Lekshmi Ramesh, Chandra R. Murthy, Himanshu Tyagi

Abstract: We study the problem of recovering the common $k$-sized support of a set of $n$ samples of dimension $d$, using $m$ noisy linear measurements per sample. Most prior work has focused on the case when $m$ exceeds $k$, in which case $n$ of the order $(k/m)\log(d/k)$ is both necessary and sufficient. Thus, in this regime, only the total number of measurements across the samples matter, and there is no… ▽ More We study the problem of recovering the common $k$-sized support of a set of $n$ samples of dimension $d$, using $m$ noisy linear measurements per sample. Most prior work has focused on the case when $m$ exceeds $k$, in which case $n$ of the order $(k/m)\log(d/k)$ is both necessary and sufficient. Thus, in this regime, only the total number of measurements across the samples matter, and there is not much benefit in getting more than $k$ measurements per sample. In the measurement-constrained regime where we have access to fewer than $k$ measurements per sample, we show an upper bound of $O((k^{2}/m^{2})\log d)$ on the sample complexity for successful support recovery when $m\ge 2\log d$. Along with the lower bound from our previous work, this shows a phase transition for the sample complexity of this problem around $k/m=1$. In fact, our proposed algorithm is sample-optimal in both the regimes. It follows that, in the $m\ll k$ regime, multiple measurements from the same sample are more valuable than measurements from different samples. △ Less

Submitted 12 May, 2021; v1 submitted 30 January, 2021; originally announced February 2021.

arXiv:2012.07051 [pdf, other]

doi 10.1109/TCC.2020.3020269

A Reliability-Aware, Delay Guaranteed, and Resource Efficient Placement of Service Function Chains in Softwarized 5G Networks

Authors: Prabhu Kaliyammal Thiruvasagam, Vijeth J. Kotagi, C. Siva Ram Murthy

Abstract: Network Functions Virtualization (NFV) allows flexibility, scalability, agility, and easy manageability of networks by leveraging the features of virtualization and cloud computing technologies. However, softwarization of network functions imposes many challenges. Reliability and latency are major challenges in NFV-enabled 5G networks that can lead to customer dissatisfaction and revenue loss. In… ▽ More Network Functions Virtualization (NFV) allows flexibility, scalability, agility, and easy manageability of networks by leveraging the features of virtualization and cloud computing technologies. However, softwarization of network functions imposes many challenges. Reliability and latency are major challenges in NFV-enabled 5G networks that can lead to customer dissatisfaction and revenue loss. In general, redundancy is used to improve the reliability of communication services. However, redundancy requires the same amount of additional resources and thus increases cost. In this work, we address the reliability-aware, delay guaranteed, and resource efficient Service Function Chain (SFC) placement problem in softwarized 5G networks. First, we propose a novel SFC subchaining method to enhance the reliability of an SFC without backups. If reliability requirement is not met after subchaining method, we add backups to VNFs to meet the reliability requirement. Then, we formulate the reliable SFC placement problem as an Integer Linear Programming (ILP) problem in order to solve it optimally. Owing to high computational complexity of the ILP problem for solving large input instances, we propose a modified stable matching algorithm to provide near-optimal solution in polynomial time. By extensive simulations we show that our proposed solutions consume lesser physical resources compared to state-of-the-art solutions for provisioning reliable communication services. △ Less

Submitted 13 December, 2020; originally announced December 2020.

Comments: 17 pages

arXiv:2010.06062 [pdf, other]

An Elastic IoT Device Management Platform

Authors: Rakesh Dhakshina Murthy, Mingming Liu

Abstract: With the recent advancement of technologies over the past year, IoT has become a paradigm in which devices communicate with each other and the cloud to achieve various applications in multidisciplinary fields. However, develo**, deploying, and experimenting with IoT applications are still tedious, expensive, and time-consuming due to the factors like heterogeneity of hardware and software. This… ▽ More With the recent advancement of technologies over the past year, IoT has become a paradigm in which devices communicate with each other and the cloud to achieve various applications in multidisciplinary fields. However, develo**, deploying, and experimenting with IoT applications are still tedious, expensive, and time-consuming due to the factors like heterogeneity of hardware and software. This is where an IoT testbed plays a vital role in aiding developers to test their applications without being deploying it to the target environment. In this paper, we present a testbed that is scalable for heterogeneous devices and mainly focused on a small scale and medium scale IoT application. This testbed would be best suited for testing applications which demand robust nature, remote monitoring and control, incorporation of heterogeneous devices, location tracking of devices, and easy troubleshooting with security and internet connectivity concerns. This testbed is also embraced with the feature to work limit access to the internet. A detailed explanation of the design and architecture of the proposed testbed is provided. We also present a conceptual prototype of the testbed and the results obtained on experimenting under various conditions. △ Less

Submitted 12 October, 2020; originally announced October 2020.

Comments: This paper has been accepted as a regular paper to the IEEE GCAIoT 2021 conference

arXiv:2005.12078 [pdf, other]

Happy Are Those Who Grade without Seeing: A Multi-Task Learning Approach to Grade Essays Using Gaze Behaviour

Authors: Sandeep Mathias, Rudra Murthy, Diptesh Kanojia, Abhijit Mishra, Pushpak Bhattacharyya

Abstract: The gaze behaviour of a reader is helpful in solving several NLP tasks such as automatic essay grading. However, collecting gaze behaviour from readers is costly in terms of time and money. In this paper, we propose a way to improve automatic essay grading using gaze behaviour, which is learnt at run time using a multi-task learning framework. To demonstrate the efficacy of this multi-task learnin… ▽ More The gaze behaviour of a reader is helpful in solving several NLP tasks such as automatic essay grading. However, collecting gaze behaviour from readers is costly in terms of time and money. In this paper, we propose a way to improve automatic essay grading using gaze behaviour, which is learnt at run time using a multi-task learning framework. To demonstrate the efficacy of this multi-task learning based approach to automatic essay grading, we collect gaze behaviour for 48 essays across 4 essay sets, and learn gaze behaviour for the rest of the essays, numbering over 7000 essays. Using the learnt gaze behaviour, we can achieve a statistically significant improvement in performance over the state-of-the-art system for the essay sets where we have gaze data. We also achieve a statistically significant improvement for 4 other essay sets, numbering about 6000 essays, where we have no gaze behaviour data available. Our approach establishes that learning gaze behaviour improves automatic essay grading. △ Less

Submitted 1 February, 2021; v1 submitted 25 May, 2020; originally announced May 2020.

Comments: This paper was accepted for publication at AACL-IJCNLP 2020

arXiv:2005.11994 [pdf]

doi 10.3233/TAD-200264

Eye Gaze Controlled Robotic Arm for Persons with SSMI

Authors: Vinay Krishna Sharma, L. R. D. Murthy, KamalPreet Singh Saluja, Vimal Mollyn, Gourav Sharma, Pradipta Biswas

Abstract: Background: People with severe speech and motor impairment (SSMI) often uses a technique called eye pointing to communicate with outside world. One of their parents, caretakers or teachers hold a printed board in front of them and by analyzing their eye gaze manually, their intentions are interpreted. This technique is often error prone and time consuming and depends on a single caretaker. Objec… ▽ More Background: People with severe speech and motor impairment (SSMI) often uses a technique called eye pointing to communicate with outside world. One of their parents, caretakers or teachers hold a printed board in front of them and by analyzing their eye gaze manually, their intentions are interpreted. This technique is often error prone and time consuming and depends on a single caretaker. Objective: We aimed to automate the eye tracking process electronically by using commercially available tablet, computer or laptop and without requiring any dedicated hardware for eye gaze tracking. The eye gaze tracker is used to develop a video see through based AR (augmented reality) display that controls a robotic device with eye gaze and deployed for a fabric printing task. Methodology: We undertook a user centred design process and separately evaluated the web cam based gaze tracker and the video see through based human robot interaction involving users with SSMI. We also reported a user study on manipulating a robotic arm with webcam based eye gaze tracker. Results: Using our bespoke eye gaze controlled interface, able bodied users can select one of nine regions of screen at a median of less than 2 secs and users with SSMI can do so at a median of 4 secs. Using the eye gaze controlled human-robot AR display, users with SSMI could undertake representative pick and drop task at an average duration less than 15 secs and reach a randomly designated target within 60 secs using a COTS eye tracker and at an average time of 2 mins using the webcam based eye gaze tracker. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: Citation: VK Sharma, KPS Saluja, LRD Murthy, G Sharma and P Biswas, Webcam Controlled Robotic Arm for Persons with SSMI, Technology and Disability 32 (3), IOS Press 2020 [Official journal of EU AAATE association]

ACM Class: I.4; I.2; H.5.2; K.4

arXiv:1912.11247 [pdf, ps, other]

Sample-Measurement Tradeoff in Support Recovery under a Subgaussian Prior

Authors: Lekshmi Ramesh, Chandra R Murthy, Himanshu Tyagi

Abstract: Data samples from $\mathbb{R}^{d}$ with a common support of size $k$ are accessed through $m$ random linear projections (measurements) per sample. It is well-known that roughly $k$ measurements from a single sample are sufficient to recover the support. In the multiple sample setting, do $k$ overall measurements still suffice when only $m$ measurements per sample are allowed, with $m<k$? We answer… ▽ More Data samples from $\mathbb{R}^{d}$ with a common support of size $k$ are accessed through $m$ random linear projections (measurements) per sample. It is well-known that roughly $k$ measurements from a single sample are sufficient to recover the support. In the multiple sample setting, do $k$ overall measurements still suffice when only $m$ measurements per sample are allowed, with $m<k$? We answer this question in the negative by considering a generative model setting with independent samples drawn from a subgaussian prior. We show that $n=Θ((k^2/m^2)\cdot\log k(d-k))$ samples are necessary and sufficient to recover the support exactly. In turn, this shows that when $m<k$, $k$ overall measurements are insufficient for support recovery; instead we need about $m$ measurements each from $k^{2}/m^2$ samples, i.e., $k^{2}/m$ overall measurements are necessary. △ Less

Submitted 19 September, 2020; v1 submitted 24 December, 2019; originally announced December 2019.

Comments: A preliminary version of this paper appeared at IEEE International Symposium on Information Theory 2019

arXiv:1911.01212 [pdf, other]

Scrambled Translation Problem: A Problem of Denoising UNMT

Authors: Tamali Banerjee, Rudra Murthy V, Pushpak Bhattacharyya

Abstract: In this paper, we identify an interesting kind of error in the output of Unsupervised Neural Machine Translation (UNMT) systems like \textit{Undreamt}(footnote). We refer to this error type as \textit{Scrambled Translation problem}. We observe that UNMT models which use \textit{word shuffle} noise (as in case of Undreamt) can generate correct words, but fail to stitch them together to form phrases… ▽ More In this paper, we identify an interesting kind of error in the output of Unsupervised Neural Machine Translation (UNMT) systems like \textit{Undreamt}(footnote). We refer to this error type as \textit{Scrambled Translation problem}. We observe that UNMT models which use \textit{word shuffle} noise (as in case of Undreamt) can generate correct words, but fail to stitch them together to form phrases. As a result, words of the translated sentence look \textit{scrambled}, resulting in decreased BLEU. We hypothesise that the reason behind \textit{scrambled translation problem} is 'shuffling noise' which is introduced in every input sentence as a denoising strategy. To test our hypothesis, we experiment by retraining UNMT models with a simple \textit{retraining} strategy. We stop the training of the Denoising UNMT model after a pre-decided number of iterations and resume the training for the remaining iterations -- which number is also pre-decided -- using original sentence as input without adding any noise. Our proposed solution achieves significant performance improvement UNMT models that train conventionally. We demonstrate these performance gains on four language pairs, \textit{viz.}, English-French, English-German, English-Spanish, Hindi-Punjabi. Our qualitative and quantitative analysis shows that the retraining strategy helps achieve better alignment as observed by attention heatmap and better phrasal translation, leading to statistically significant improvement in BLEU scores. △ Less

Submitted 17 June, 2021; v1 submitted 30 October, 2019; originally announced November 2019.

Comments: Accepted by MT Summit 2021

arXiv:1909.08185 [pdf, ps, other]

Learned-SBL: A Deep Learning Architecture for Sparse Signal Recovery

Authors: Rubin Jose Peter, Chandra R. Murthy

Abstract: In this paper, we present a computationally efficient sparse signal recovery scheme using Deep Neural Networks (DNN). The architecture of the introduced neural network is inspired from sparse Bayesian learning (SBL) and named as Learned-SBL (L-SBL). We design a common architecture to recover sparse as well as block sparse vectors from single measurement vector (SMV) or multiple measurement vectors… ▽ More In this paper, we present a computationally efficient sparse signal recovery scheme using Deep Neural Networks (DNN). The architecture of the introduced neural network is inspired from sparse Bayesian learning (SBL) and named as Learned-SBL (L-SBL). We design a common architecture to recover sparse as well as block sparse vectors from single measurement vector (SMV) or multiple measurement vectors (MMV) depending on the nature of the training data. In the MMV model, the L-SBL network can be trained to learn any underlying sparsity pattern among the vectors including joint sparsity, block sparsity, etc. In particular, for block sparse recovery, learned-SBL does not require any prior knowledge of block boundaries. In each layer of the L-SBL, an estimate of the signal covariance matrix is obtained as the output of a neural network. Then a maximum a posteriori (MAP) estimator of the unknown sparse vector is implemented with non-trainable parameters. In many applications, the measurement matrix may be time-varying. The existing DNN based sparse signal recovery schemes demand the retraining of the neural network using current measurement matrix. The architecture of L-SBL allows it to accept the measurement matrix as an input to the network, and thereby avoids the need for retraining. We also evaluate the performance of Learned-SBL in the detection of an extended target using a multiple-input multiple-output (MIMO) radar. Simulation results illustrate that the proposed approach offers superior sparse recovery performance compared to the state-of-the-art methods. △ Less

Submitted 17 September, 2019; originally announced September 2019.

Comments: 13 pages, 22 figures

arXiv:1904.05864 [pdf, ps, other]

doi 10.1109/LNET.2019.2902720

The More the Merrier: Enhancing Reliability of 5G Communication Services with Guaranteed Delay

Authors: Prabhu Kaliyammal Thiruvasagam, Vijeth J Kotagi, C Siva Ram Murthy

Abstract: Although network functions virtualization and software-defined networking offer many dynamic features such as flexibility, scalability, and programmability for easy provisioning of services at a lesser cost and time through service function chaining, it introduces new challenges in terms of reliability, availability, and latency of services. Particularly, softwarization of network and service func… ▽ More Although network functions virtualization and software-defined networking offer many dynamic features such as flexibility, scalability, and programmability for easy provisioning of services at a lesser cost and time through service function chaining, it introduces new challenges in terms of reliability, availability, and latency of services. Particularly, softwarization of network and service functions (e.g., virtualization, anything as a service, dynamic virtual chaining, and routing) impose high possibility of network failures due to software issues than hardware. In this letter, we propose a novel solution called eRESERV to enhance the reliability of service chains in 5G while meeting the service level agreements. △ Less

Submitted 11 April, 2019; originally announced April 2019.

Comments: 4 pages, 10 figures

arXiv:1811.00383 [pdf, other]

Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages

Authors: Rudra Murthy V, Anoop Kunchukuttan, Pushpak Bhattacharyya

Abstract: Transfer learning approaches for Neural Machine Translation (NMT) train a NMT model on the assisting-target language pair (parent model) which is later fine-tuned for the source-target language pair of interest (child model), with the target language being the same. In many cases, the assisting language has a different word order from the source language. We show that divergent word order adversel… ▽ More Transfer learning approaches for Neural Machine Translation (NMT) train a NMT model on the assisting-target language pair (parent model) which is later fine-tuned for the source-target language pair of interest (child model), with the target language being the same. In many cases, the assisting language has a different word order from the source language. We show that divergent word order adversely limits the benefits from transfer learning when little to no parallel corpus between the source and target language is available. To bridge this divergence, We propose to pre-order the assisting language sentence to match the word order of the source language and train the parent model. Our experiments on many language pairs show that bridging the word order gap leads to significant improvement in the translation quality. △ Less

Submitted 10 April, 2019; v1 submitted 1 November, 2018; originally announced November 2018.

Comments: Accepted as Short Paper at NAACL 2019

arXiv:1801.03813 [pdf, ps, other]

Energy Harvesting Communications Using Dual Alternating Batteries

Authors: Rajshekhar Vishweshwar Bhat, Mehul Motani, Chandra R Murthy, Rahul Vaze

Abstract: Practical energy harvesting (EH) based communication systems typically use a battery to temporarily store the harvested energy prior to its use for communication. The batteries can be damaged when they are repeatedly charged (discharged) after being partially discharged (charged), overcharged or deeply discharged. This motivates the cycle constraint which says that a battery must be charged (disch… ▽ More Practical energy harvesting (EH) based communication systems typically use a battery to temporarily store the harvested energy prior to its use for communication. The batteries can be damaged when they are repeatedly charged (discharged) after being partially discharged (charged), overcharged or deeply discharged. This motivates the cycle constraint which says that a battery must be charged (discharged) only after it is sufficiently discharged (charged). We also assume Bernoulli energy arrivals, and a half-duplex constraint due to which the batteries are not charged and discharged simultaneously. In this context, we study EH communication systems with: (a) a single-battery with capacity 2B units and (b) dual-batteries, each having capacity of B units. The aim is to obtain the best possible long-term average throughputs and throughput regions in point-to-point (P2P) channels and multiple access channels (MAC), respectively. For the P2P channel, we obtain an analytical optimal solution in the single-battery case, and propose optimal and sub-optimal power allocation policies for the dual-battery case. We extend these policies to obtain achievable throughput regions in MACs by jointly allocating rates and powers. From numerical simulations, we find that the optimal throughput in the dual-battery case is significantly higher than that in the single-battery case, although the total storage capacity in both cases is 2B units. Further, in the proposed policies, the largest throughput region in the single-battery case is contained within that of the dual-battery case. △ Less

Submitted 18 December, 2018; v1 submitted 11 January, 2018; originally announced January 2018.

Comments: A single battery case is added and its performance is compared with that of the dual-battery case, with additional simulation results

arXiv:1709.05789 [pdf, other]

doi 10.1109/TSP.2017.2781652

On the Restricted Isometry of the Columnwise Khatri-Rao Product

Authors: Saurabh Khanna, Chandra R Murthy

Abstract: The columnwise Khatri-Rao product of two matrices is an important matrix type, reprising its role as a structured sensing matrix in many fundamental linear inverse problems. Robust signal recovery in such inverse problems is often contingent on proving the restricted isometry property (RIP) of a certain system matrix expressible as a Khatri-Rao product of two matrices. In this work, we analyze the… ▽ More The columnwise Khatri-Rao product of two matrices is an important matrix type, reprising its role as a structured sensing matrix in many fundamental linear inverse problems. Robust signal recovery in such inverse problems is often contingent on proving the restricted isometry property (RIP) of a certain system matrix expressible as a Khatri-Rao product of two matrices. In this work, we analyze the RIP of a generic columnwise Khatri-Rao product matrix by deriving two upper bounds for its $k^{\text{th}}$ order Restricted Isometry Constant ($k$-RIC) for different values of $k$. The first RIC bound is computed in terms of the individual RICs of the input matrices participating in the Khatri-Rao product. The second RIC bound is probabilistic, and is specified in terms of the input matrix dimensions. We show that the Khatri-Rao product of a pair of $m \times n$ sized random matrices comprising independent and identically distributed subgaussian entries satisfies $k$-RIP with arbitrarily high probability, provided $m$ exceeds $O(k \log n)$. Our RIC bounds confirm that the Khatri-Rao product exhibits stronger restricted isometry compared to its constituent matrices for the same RIP order. The proposed RIC bounds are potentially useful in the sample complexity analysis of several sparse recovery problems. △ Less

Submitted 23 July, 2018; v1 submitted 18 September, 2017; originally announced September 2017.

Comments: The probabilistic RIC bounds presented as Theorems 2 and 3 in the earlier version (v1 and v2) could be potentially incorrect and have been restated with new proofs in this revised manuscript

arXiv:1706.04442 [pdf, other]

On Distributed Power Control for Uncoordinated Dual Energy Harvesting Links: Performance Bounds and Near-Optimal Policies

Authors: Mohit K. Sharma, Chandra R. Murthy, Rahul Vaze

Abstract: In this paper, we consider a point-to-point link between an energy harvesting transmitter and receiver, where neither node has the information about the battery state or energy availability at the other node. We consider a model where data is successfully delivered only in slots where both nodes are active. Energy loss occurs whenever one node turns on while the other node is in sleep mode. In eac… ▽ More In this paper, we consider a point-to-point link between an energy harvesting transmitter and receiver, where neither node has the information about the battery state or energy availability at the other node. We consider a model where data is successfully delivered only in slots where both nodes are active. Energy loss occurs whenever one node turns on while the other node is in sleep mode. In each slot, based on their own energy availability, the transmitter and receiver need to independently decide whether or not to turn on, with the aim of maximizing the long-term time-average throughput. We present an upper bound on the throughput achievable by analyzing a genie-aided system that has noncausal knowledge of the energy arrivals at both the nodes. Next, we propose an online policy requiring an occasional one-bit feedback whose throughput is within one bit of the upper bound, asymptotically in the battery size. In order to further reduce the feedback required, we propose a time-dilated version of the online policy. As the time dilation gets large, this policy does not require any feedback and achieves the upper bound asymptotically in the battery size. Inspired by this, we also propose a near-optimal fully uncoordinated policy. We use Monte Carlo simulations to validate our theoretical results and illustrate the performance of the proposed policies. △ Less

Submitted 14 June, 2017; originally announced June 2017.

Comments: 8 pages

arXiv:1706.00149 [pdf, other]

Analysis of Full-Duplex Downlink Using Diversity Gain

Authors: Chandan Pradhan, Garimella Rama Murthy

Abstract: The paper carries out performance analysis of a multiuser full-duplex (FD) communication system. Multiple FD UEs share the same spectrum resources, simultaneously, at both the uplink and downlink. This results in co-channel interference (CCI) at the downlink of a UE from uplink signals of other UEs. This work proposes the use of diversity gain at the receiver to mitigate the effects of the CCI. Fo… ▽ More The paper carries out performance analysis of a multiuser full-duplex (FD) communication system. Multiple FD UEs share the same spectrum resources, simultaneously, at both the uplink and downlink. This results in co-channel interference (CCI) at the downlink of a UE from uplink signals of other UEs. This work proposes the use of diversity gain at the receiver to mitigate the effects of the CCI. For this an architecture for the FD eNB and FD UE is proposed and corresponding downlink operation is described. Finally, the performance of the system is studied in terms of downlink capacity of a UE. It is shown that through the deployment of sufficient number of transmit and receive antennas at the eNB and UEs, respectively, significant improvement in performance can be achieved in the presence of CCI. △ Less

Submitted 31 May, 2017; originally announced June 2017.

Comments: Under review at Wireless Personal Communication

arXiv:1703.04930 [pdf, ps, other]

On the Support Recovery of Jointly Sparse Gaussian Sources using Sparse Bayesian Learning

Authors: Saurabh Khanna, Chandra R. Murthy

Abstract: In this work, we provide non-asymptotic, probabilistic guarantees for successful recovery of the common nonzero support of jointly sparse Gaussian sources in the multiple measurement vector (MMV) problem. The support recovery problem is formulated as the marginalized maximum likelihood (or type-II ML) estimation of the variance hyperparameters of a joint sparsity inducing Gaussian prior on the sou… ▽ More In this work, we provide non-asymptotic, probabilistic guarantees for successful recovery of the common nonzero support of jointly sparse Gaussian sources in the multiple measurement vector (MMV) problem. The support recovery problem is formulated as the marginalized maximum likelihood (or type-II ML) estimation of the variance hyperparameters of a joint sparsity inducing Gaussian prior on the source signals. We derive conditions under which the resulting nonconvex constrained optimization perfectly recovers the nonzero support of a joint-sparse Gaussian source ensemble with arbitrarily high probability. The support error probability decays exponentially with the number of MMVs at a rate that depends on the smallest restricted singular value and the nonnegative null space property of the self Khatri-Rao product of the sensing matrix. Our analysis confirms that nonzero supports of size as high as O($m^2$) are recoverable from $m$ measurements per sparse vector. Our derived sufficient conditions for support consistency of the proposed constrained type-II ML solution also guarantee the support consistency of any global solution of the multiple sparse Bayesian learning (M-SBL) optimization whose nonzero coefficients lie inside a bounded interval. For the case of noiseless measurements, we further show that a single MMV is sufficient for perfect recovery of the $k$-sparse support by M-SBL, provided all subsets of $k + 1$ columns of the sensing matrix are linearly independent. △ Less

Submitted 26 July, 2021; v1 submitted 15 March, 2017; originally announced March 2017.

arXiv:1611.08358 [pdf]

Kannada Spell Checker with Sandhi Splitter

Authors: A N Akshatha, Chandana G Upadhyaya, Rajashekara S Murthy

Abstract: Spelling errors are introduced in text either during ty**, or when the user does not know the correct phoneme or grapheme. If a language contains complex words like sandhi where two or more morphemes join based on some rules, spell checking becomes very tedious. In such situations, having a spell checker with sandhi splitter which alerts the user by flagging the errors and providing suggestions… ▽ More Spelling errors are introduced in text either during ty**, or when the user does not know the correct phoneme or grapheme. If a language contains complex words like sandhi where two or more morphemes join based on some rules, spell checking becomes very tedious. In such situations, having a spell checker with sandhi splitter which alerts the user by flagging the errors and providing suggestions is very useful. A novel algorithm of sandhi splitting is proposed in this paper. The sandhi splitter can split about 7000 most common sandhi words in Kannada language used as test samples. The sandhi splitter was integrated with a Kannada spell checker and a mechanism for generating suggestions was added. A comprehensive, platform independent, standalone spell checker with sandhi splitter application software was thus developed and tested extensively for its efficiency and correctness. A comparative analysis of this spell checker with sandhi splitter was made and results concluded that the Kannada spell checker with sandhi splitter has an improved performance. It is twice as fast, 200 times more space efficient, and it is 90% accurate in case of complex nouns and 50% accurate for complex verbs. Such a spell checker with sandhi splitter will be of foremost significance in machine translation systems, voice processing, etc. This is the first sandhi splitter in Kannada and the advantage of the novel algorithm is that, it can be extended to all Indian languages. △ Less

Submitted 25 November, 2016; originally announced November 2016.

Comments: 7 pages, 10 figures

arXiv:1607.00198 [pdf, other]

Sharing Network Parameters for Crosslingual Named Entity Recognition

Authors: Rudra Murthy V, Mitesh Khapra, Pushpak Bhattacharyya

Abstract: Most state of the art approaches for Named Entity Recognition rely on hand crafted features and annotated corpora. Recently Neural network based models have been proposed which do not require handcrafted features but still require annotated corpora. However, such annotated corpora may not be available for many languages. In this paper, we propose a neural network based model which allows sharing t… ▽ More Most state of the art approaches for Named Entity Recognition rely on hand crafted features and annotated corpora. Recently Neural network based models have been proposed which do not require handcrafted features but still require annotated corpora. However, such annotated corpora may not be available for many languages. In this paper, we propose a neural network based model which allows sharing the decoder as well as word and character level parameters between two languages thereby allowing a resource fortunate language to aid a resource deprived language. Specifically, we focus on the case when limited annotated corpora is available in one language ($L_1$) and abundant annotated corpora is available in another language ($L_2$). Sharing the network architecture and parameters between $L_1$ and $L_2$ leads to improved performance in $L_1$. Further, our approach does not require any hand crafted features but instead directly learns meaningful feature representations from the training data itself. We experiment with 4 language pairs and show that indeed in a resource constrained setup (lesser annotated corpora), a model jointly trained with data from another language performs better than a model trained only on the limited corpora in one language. △ Less

Submitted 1 July, 2016; originally announced July 2016.

Showing 1–50 of 91 results for author: Murthy, R