-
Band degeneracy, resonant level formation and low thermal conductivity in dilute In and Ga co-doped thermoelectric compound SnTe
Authors:
Gaurav Jamwal,
Ankit Kumar,
Mohd Warish,
Shruti Chakravarty,
Saravanan Muthiah,
Asokan Kandasami,
Asad Niazi
Abstract:
We report the effect of co-do** of In and Ga at low concentrations on the structural, electronic, and thermoelectric properties of SnTe based compositions $Sn_{1.03-2x}In_{x}Ga_{x}Te$ (x = 0, 0.01, 0.02, 0.04) prepared by the solid-state route and spark plasma sintering (SPS). All compositions formed in the fcc structure (Fm-3m) with no other impurity phase. The optical band gap increased with t…
▽ More
We report the effect of co-do** of In and Ga at low concentrations on the structural, electronic, and thermoelectric properties of SnTe based compositions $Sn_{1.03-2x}In_{x}Ga_{x}Te$ (x = 0, 0.01, 0.02, 0.04) prepared by the solid-state route and spark plasma sintering (SPS). All compositions formed in the fcc structure (Fm-3m) with no other impurity phase. The optical band gap increased with the co-do**, indicative of band convergence effects. First principle electronic structure calculations showed band convergence and the formation of resonant levels, due to Ga and In do** respectively. The carrier concentration increased on hole-do** by In and Ga ions while carrier mobility decreased due to impurity scattering. The resistivity increased with temperature, indicative of the degenerate semiconducting character of the compounds. The Seebeck coefficient of the doped samples increased linearly with temperature, reaching 85 - 95 $μ$V/K at 783 K. Thermal conductivity decreased sharply with co-do**, and the lattice thermal conductivity dropped to 0.42 W$m^{-1}$ $K^{-1}$ above 750 K. The enhanced power factor and low lattice thermal conductivity on do** resulted in a maximum figure of merit ZT = 0.34 at 773 K, twice that of the pristine SnTe.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Scrutinizing Shipment Records To Thwart Illegal Timber Trade
Authors:
Debanjan Datta,
Sathappan Muthiah,
John Simeone,
Amelia Meadows,
Naren Ramakrishnan
Abstract:
Timber and forest products made from wood, like furniture, are valuable commodities, and like the global trade of many highly-valued natural resources, face challenges of corruption, fraud, and illegal harvesting. These grey and black market activities in the wood and forest products sector are not limited to the countries where the wood was harvested, but extend throughout the global supply chain…
▽ More
Timber and forest products made from wood, like furniture, are valuable commodities, and like the global trade of many highly-valued natural resources, face challenges of corruption, fraud, and illegal harvesting. These grey and black market activities in the wood and forest products sector are not limited to the countries where the wood was harvested, but extend throughout the global supply chain and have been tied to illicit financial flows, like trade-based money laundering, document fraud, species mislabeling, and other illegal activities. The task of finding such fraudulent activities using trade data, in the absence of ground truth, can be modelled as an unsupervised anomaly detection problem. However existing approaches suffer from certain shortcomings in their applicability towards large scale trade data. Trade data is heterogeneous, with both categorical and numerical attributes in a tabular format. The overall challenge lies in the complexity, volume and velocity of data, with large number of entities and lack of ground truth labels. To mitigate these, we propose a novel unsupervised anomaly detection -- Contrastive Learning based Heterogeneous Anomaly Detection (CHAD) that is generally applicable for large-scale heterogeneous tabular data. We demonstrate our model CHAD performs favorably against multiple comparable baselines for public benchmark datasets, and outperforms them in the case of trade data. More importantly we demonstrate our approach reduces assumptions and efforts required hyperparameter tuning, which is a key challenging aspect in an unsupervised training paradigm. Specifically, our overarching objective pertains to detecting suspicious timber shipments and patterns using Bill of Lading trade record data. Detecting anomalous transactions in shipment records can enable further investigation by government agencies and supply chain constituents.
△ Less
Submitted 31 July, 2022;
originally announced August 2022.
-
Lessons from Deep Learning applied to Scholarly Information Extraction: What Works, What Doesn't, and Future Directions
Authors:
Raquib Bin Yousuf,
Subhodip Biswas,
Kulendra Kumar Kaushal,
James Dunham,
Rebecca Gelles,
Sathappan Muthiah,
Nathan Self,
Patrick Butler,
Naren Ramakrishnan
Abstract:
Understanding key insights from full-text scholarly articles is essential as it enables us to determine interesting trends, give insight into the research and development, and build knowledge graphs. However, some of the interesting key insights are only available when considering full-text. Although researchers have made significant progress in information extraction from short documents, extract…
▽ More
Understanding key insights from full-text scholarly articles is essential as it enables us to determine interesting trends, give insight into the research and development, and build knowledge graphs. However, some of the interesting key insights are only available when considering full-text. Although researchers have made significant progress in information extraction from short documents, extraction of scientific entities from full-text scholarly literature remains a challenging problem. This work presents an automated End-to-end Research Entity Extractor called EneRex to extract technical facets such as dataset usage, objective task, method from full-text scholarly research articles. Additionally, we extracted three novel facets, e.g., links to source code, computing resources, programming language/libraries from full-text articles. We demonstrate how EneRex is able to extract key insights and trends from a large-scale dataset in the domain of computer science. We further test our pipeline on multiple datasets and found that the EneRex improves upon a state of the art model. We highlight how the existing datasets are limited in their capacity and how EneRex may fit into an existing knowledge graph. We also present a detailed discussion with pointers for future research. Our code and data are publicly available at https://github.com/DiscoveryAnalyticsCenter/EneRex.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Exo-SIR: An Epidemiological Model to Analyze the Impact of Exogenous Spread of Infection
Authors:
Nirmal Kumar Sivaraman,
Manas Gaur,
Shivansh Baijal,
Sakthi Balan Muthiah,
Amit Sheth
Abstract:
Epidemics like Covid-19 and Ebola have impacted people's lives significantly. The impact of mobility of people across the countries or states in the spread of epidemics has been significant. The spread of disease due to factors local to the population under consideration is termed the endogenous spread. The spread due to external factors like migration, mobility, etc. is called the exogenous sprea…
▽ More
Epidemics like Covid-19 and Ebola have impacted people's lives significantly. The impact of mobility of people across the countries or states in the spread of epidemics has been significant. The spread of disease due to factors local to the population under consideration is termed the endogenous spread. The spread due to external factors like migration, mobility, etc. is called the exogenous spread. In this paper, we introduce the Exo-SIR model, an extension of the popular SIR model and a few variants of the model. The novelty in our model is that it captures both the exogenous and endogenous spread of the virus. First, we present an analytical study. Second, we simulate the Exo-SIR model with and without assuming contact network for the population. Third, we implement the Exo-SIR model on real datasets regarding Covid-19 and Ebola. We found that endogenous infection is influenced by exogenous infection. Furthermore, we found that the Exo-SIR model predicts the peak time better than the SIR model. Hence, the Exo-SIR model would be helpful for governments to plan policy interventions at the time of a pandemic.
△ Less
Submitted 3 May, 2022;
originally announced May 2022.
-
Using AntiPatterns to avoid MLOps Mistakes
Authors:
Nikhil Muralidhar,
Sathappah Muthiah,
Patrick Butler,
Manish Jain,
Yu Yu,
Katy Burne,
Weipeng Li,
David Jones,
Prakash Arunachalam,
Hays 'Skip' McCormick,
Naren Ramakrishnan
Abstract:
We describe lessons learned from develo** and deploying machine learning models at scale across the enterprise in a range of financial analytics applications. These lessons are presented in the form of antipatterns. Just as design patterns codify best software engineering practices, antipatterns provide a vocabulary to describe defective practices and methodologies. Here we catalog and document…
▽ More
We describe lessons learned from develo** and deploying machine learning models at scale across the enterprise in a range of financial analytics applications. These lessons are presented in the form of antipatterns. Just as design patterns codify best software engineering practices, antipatterns provide a vocabulary to describe defective practices and methodologies. Here we catalog and document numerous antipatterns in financial ML operations (MLOps). Some antipatterns are due to technical errors, while others are due to not having sufficient knowledge of the surrounding context in which ML results are used. By providing a common vocabulary to discuss these situations, our intent is that antipatterns will support better documentation of issues, rapid communication between stakeholders, and faster resolution of problems. In addition to cataloging antipatterns, we describe solutions, best practices, and future directions toward MLOps maturity.
△ Less
Submitted 30 June, 2021;
originally announced July 2021.
-
Detecting Anomalies Through Contrast in Heterogeneous Data
Authors:
Debanjan Datta,
Sathappan Muthiah,
Naren Ramakrishnan
Abstract:
Detecting anomalies has been a fundamental approach in detecting potentially fraudulent activities. Tasked with detection of illegal timber trade that threatens ecosystems and economies and association with other illegal activities, we formulate our problem as one of anomaly detection. Among other challenges annotations are unavailable for our large-scale trade data with heterogeneous features (ca…
▽ More
Detecting anomalies has been a fundamental approach in detecting potentially fraudulent activities. Tasked with detection of illegal timber trade that threatens ecosystems and economies and association with other illegal activities, we formulate our problem as one of anomaly detection. Among other challenges annotations are unavailable for our large-scale trade data with heterogeneous features (categorical and continuous), that can assist in building automated systems to detect fraudulent transactions. Modelling the task as unsupervised anomaly detection, we propose a novel model Contrastive Learning based Heterogeneous Anomaly Detector to address shortcomings of prior models. Our model uses an asymmetric autoencoder that can effectively handle large arity categorical variables, but avoids assumptions about structure of data in low-dimensional latent space and is robust to changes to hyper-parameters. The likelihood of data is approximated through an estimator network, which is jointly trained with the autoencoder,using negative sampling. Further the details and intuition for an effective negative sample generation approach for heterogeneous data are outlined. We provide a qualitative study to showcase the effectiveness of our model in detecting anomalies in timber trade.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Differential Tracking Across Topical Webpages of Indian News Media
Authors:
Yash Vekaria,
Vibhor Agarwal,
Pushkal Agarwal,
Sangeeta Mahapatra,
Sakthi Balan Muthiah,
Nishanth Sastry,
Nicolas Kourtellis
Abstract:
Online user privacy and tracking have been extensively studied in recent years, especially due to privacy and personal data-related legislations in the EU and the USA, such as the General Data Protection Regulation, ePrivacy Regulation, and California Consumer Privacy Act. Research has revealed novel tracking and personal identifiable information leakage methods that first- and third-parties emplo…
▽ More
Online user privacy and tracking have been extensively studied in recent years, especially due to privacy and personal data-related legislations in the EU and the USA, such as the General Data Protection Regulation, ePrivacy Regulation, and California Consumer Privacy Act. Research has revealed novel tracking and personal identifiable information leakage methods that first- and third-parties employ on websites around the world, as well as the intensity of tracking performed on such websites. However, for the sake of scaling to cover a large portion of the Web, most past studies focused on homepages of websites, and did not look deeper into the tracking practices on their topical subpages. The majority of studies focused on the Global North markets such as the EU and the USA. Large markets such as India, which covers 20% of the world population and has no explicit privacy laws, have not been studied in this regard.
We aim to address these gaps and focus on the following research questions: Is tracking on topical subpages of Indian news websites different from their homepage? Do third-party trackers prefer to track specific topics? How does this preference compare to the similarity of content shown on these topical subpages? To answer these questions, we propose a novel method for automatic extraction and categorization of Indian news topical subpages based on the details in their URLs. We study the identified topical subpages and compare them with their homepages with respect to the intensity of cookie injection and third-party embeddedness and type. We find differential user tracking among subpages, and between subpages and homepages. We also find a preferential attachment of third-party trackers to specific topics. Also, embedded third-parties tend to track specific subpages simultaneously, revealing possible user profiling in action.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
Under the Spotlight: Web Tracking in Indian Partisan News Websites
Authors:
Vibhor Agarwal,
Yash Vekaria,
Pushkal Agarwal,
Sangeeta Mahapatra,
Shounak Set,
Sakthi Balan Muthiah,
Nishanth Sastry,
Nicolas Kourtellis
Abstract:
India is experiencing intense political partisanship and sectarian divisions. The paper performs, to the best of our knowledge, the first comprehensive analysis on the Indian online news media with respect to tracking and partisanship. We build a dataset of 103 online, mostly mainstream news websites. With the help of two experts, alongside data from the Media Ownership Monitor of the Reporters wi…
▽ More
India is experiencing intense political partisanship and sectarian divisions. The paper performs, to the best of our knowledge, the first comprehensive analysis on the Indian online news media with respect to tracking and partisanship. We build a dataset of 103 online, mostly mainstream news websites. With the help of two experts, alongside data from the Media Ownership Monitor of the Reporters without Borders, we label these websites according to their partisanship (Left, Right, or Centre). We study and compare user tracking on these sites with different metrics: numbers of cookies, cookie synchronizations, device fingerprinting, and invisible pixel-based tracking. We find that Left and Centre websites serve more cookies than Right-leaning websites. However, through cookie synchronization, more user IDs are synchronized in Left websites than Right or Centre. Canvas fingerprinting is used similarly by Left and Right, and less by Centre. Invisible pixel-based tracking is 50% more intense in Centre-leaning websites than Right, and 25% more than Left. Desktop versions of news websites deliver more cookies than their mobile counterparts. A handful of third-parties are tracking users in most websites in this study. This paper, by demonstrating intense web tracking, has implications for research on overall privacy of users visiting partisan news websites in India.
△ Less
Submitted 8 March, 2021; v1 submitted 6 February, 2021;
originally announced February 2021.
-
Exo-SIR: An Epidemiological Model to Analyze the Impact of Exogenous Infection of COVID-19 in India
Authors:
Nirmal Kumar Sivaraman,
Manas Gaur,
Shivansh Baijal,
Ch V Radha Sai Rupesh,
Sakthi Balan Muthiah,
Amit Sheth
Abstract:
Epidemiological models are the mathematical models that capture the dynamics of epidemics. The spread of the virus has two routes - exogenous and endogenous. The exogenous spread is from outside the population under study, and endogenous spread is within the population under study. Although some of the models consider the exogenous source of infection, they have not studied the interplay between e…
▽ More
Epidemiological models are the mathematical models that capture the dynamics of epidemics. The spread of the virus has two routes - exogenous and endogenous. The exogenous spread is from outside the population under study, and endogenous spread is within the population under study. Although some of the models consider the exogenous source of infection, they have not studied the interplay between exogenous and endogenous spreads. In this paper, we introduce a novel model - the Exo-SIR model that captures both the exogenous and endogenous spread of the virus. We analyze to find out the relationship between endogenous and exogenous infections during the Covid19 pandemic. First, we simulate the Exo-SIR model without assuming any contact network for the population. Second, simulate it by assuming that the contact network is a scale free network. Third, we implemented the Exo-SIR model on a real dataset regarding Covid19. We found that endogenous infection is influenced by even a minimal rate of exogenous infection. Also, we found that in the presence of exogenous infection, the endogenous infection peak becomes higher, and the peak occurs earlier. This means that if we consider our response to a pandemic like Covid19, we should be prepared for an earlier and higher number of cases than the SIR model suggests if there are the exogenous source(s) of infection.
△ Less
Submitted 19 September, 2020; v1 submitted 12 August, 2020;
originally announced August 2020.
-
On the identification of $k$-inductively pierced codes using toric ideals
Authors:
Molly Hoch,
Samuel Muthiah,
Nida Obatake
Abstract:
Neural codes are binary codes in $\{0,1\}^n$; here we focus on the ones which represent the firing patterns of a type of neurons called place cells. There is much interest in determining which neural codes can be realized by a collection of convex sets. However, drawing representations of these convex sets, particularly as the number of neurons in a code increases, can be very difficult. Neverthel…
▽ More
Neural codes are binary codes in $\{0,1\}^n$; here we focus on the ones which represent the firing patterns of a type of neurons called place cells. There is much interest in determining which neural codes can be realized by a collection of convex sets. However, drawing representations of these convex sets, particularly as the number of neurons in a code increases, can be very difficult. Nevertheless, for a class of codes that are said to be $k$-inductively pierced for $k=0,1,2$ there is an algorithm for drawing Euler diagrams. Here we use the toric ideal of a code to show sufficient conditions for a code to be 1- or 2-inductively pierced, so that we may use the existing algorithm to draw realizations of such codes.
△ Less
Submitted 6 July, 2018;
originally announced July 2018.
-
Every Binary Code Can Be Realized by Convex Sets
Authors:
Megan K. Franke,
Samuel Muthiah
Abstract:
Much work has been done to identify which binary codes can be represented by collections of open convex or closed convex sets. While not all binary codes can be realized by such sets, here we prove that every binary code can be realized by convex sets when there is no restriction on whether the sets are all open or closed. We achieve this by constructing a convex realization for an arbitrary code…
▽ More
Much work has been done to identify which binary codes can be represented by collections of open convex or closed convex sets. While not all binary codes can be realized by such sets, here we prove that every binary code can be realized by convex sets when there is no restriction on whether the sets are all open or closed. We achieve this by constructing a convex realization for an arbitrary code with $k$ nonempty codewords in $\mathbb{R}^{k-1}$. This result justifies the usual restriction of the definition of convex neural codes to include only those that can be realized by receptive fields that are all either open convex or closed convex. We also show that the dimension of our construction cannot in general be lowered.
△ Less
Submitted 26 April, 2018; v1 submitted 8 November, 2017;
originally announced November 2017.
-
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecasting System
Authors:
Sathappan Muthiah,
Patrick Butler,
Rupinder Paul Khandpur,
Parang Saraf,
Nathan Self,
Alla Rozovskaya,
Liang Zhao,
Jose Cadena,
Chang-Tien Lu,
Anil Vullikanti,
Achla Marathe,
Kristen Summers,
Graham Katz,
Andy Doyle,
Jaime Arredondo,
Dipak K. Gupta,
David Mares,
Naren Ramakrishnan
Abstract:
EMBERS is an anticipatory intelligence system forecasting population-level events in multiple countries of Latin America. A deployed system from 2012, EMBERS has been generating alerts 24x7 by ingesting a broad range of data sources including news, blogs, tweets, machine coded events, currency rates, and food prices. In this paper, we describe our experiences operating EMBERS continuously for near…
▽ More
EMBERS is an anticipatory intelligence system forecasting population-level events in multiple countries of Latin America. A deployed system from 2012, EMBERS has been generating alerts 24x7 by ingesting a broad range of data sources including news, blogs, tweets, machine coded events, currency rates, and food prices. In this paper, we describe our experiences operating EMBERS continuously for nearly 4 years, with specific attention to the discoveries it has enabled, correct as well as missed forecasts, and lessons learnt from participating in a forecasting tournament including our perspectives on the limits of forecasting and ethical considerations.
△ Less
Submitted 31 March, 2016;
originally announced April 2016.
-
Hierarchical Quickest Change Detection via Surrogates
Authors:
Prithwish Chakraborty,
Sathappan Muthiah,
Ravi Tandon,
Naren Ramakrishnan
Abstract:
Change detection (CD) in time series data is a critical problem as it reveal changes in the underlying generative processes driving the time series. Despite having received significant attention, one important unexplored aspect is how to efficiently utilize additional correlated information to improve the detection and the understanding of changepoints. We propose hierarchical quickest change dete…
▽ More
Change detection (CD) in time series data is a critical problem as it reveal changes in the underlying generative processes driving the time series. Despite having received significant attention, one important unexplored aspect is how to efficiently utilize additional correlated information to improve the detection and the understanding of changepoints. We propose hierarchical quickest change detection (HQCD), a framework that formalizes the process of incorporating additional correlated sources for early changepoint detection. The core ideas behind HQCD are rooted in the theory of quickest detection and HQCD can be regarded as its novel generalization to a hierarchical setting. The sources are classified into targets and surrogates, and HQCD leverages this structure to systematically assimilate observed data to update changepoint statistics across layers. The decision on actual changepoints are provided by minimizing the delay while still maintaining reliability bounds. In addition, HQCD also uncovers interesting relations between changes at targets from changes across surrogates. We validate HQCD for reliability and performance against several state-of-the-art methods for both synthetic dataset (known changepoints) and several real-life examples (unknown changepoints). Our experiments indicate that we gain significant robustness without loss of detection delay through HQCD. Our real-life experiments also showcase the usefulness of the hierarchical setting by connecting the surrogate sources (such as Twitter chatter) to target sources (such as Employment related protests that ultimately lead to major uprisings).
△ Less
Submitted 31 March, 2016;
originally announced March 2016.
-
Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning
Authors:
Yue Ning,
Sathappan Muthiah,
Huzefa Rangwala,
Naren Ramakrishnan
Abstract:
Forecasting events like civil unrest movements, disease outbreaks, financial market movements and government elections from open source indicators such as news feeds and social media streams is an important and challenging problem. From the perspective of human analysts and policy makers, forecasting algorithms need to provide supporting evidence and identify the causes related to the event of int…
▽ More
Forecasting events like civil unrest movements, disease outbreaks, financial market movements and government elections from open source indicators such as news feeds and social media streams is an important and challenging problem. From the perspective of human analysts and policy makers, forecasting algorithms need to provide supporting evidence and identify the causes related to the event of interest. We develop a novel multiple instance learning based approach that jointly tackles the problem of identifying evidence-based precursors and forecasts events into the future. Specifically, given a collection of streaming news articles from multiple sources we develop a nested multiple instance learning approach to forecast significant societal events across three countries in Latin America. Our algorithm is able to identify news articles considered as precursors for a protest. Our empirical evaluation shows the strengths of our proposed approaches in filtering candidate precursors, forecasting the occurrence of events with a lead time and predicting the characteristics of different events in comparison to several other formulations. We demonstrate through case studies the effectiveness of our proposed model in filtering the candidate precursors for inspection by a human analyst.
△ Less
Submitted 24 August, 2016; v1 submitted 25 February, 2016;
originally announced February 2016.
-
'Beating the news' with EMBERS: Forecasting Civil Unrest using Open Source Indicators
Authors:
Naren Ramakrishnan,
Patrick Butler,
Sathappan Muthiah,
Nathan Self,
Rupinder Khandpur,
Parang Saraf,
Wei Wang,
Jose Cadena,
Anil Vullikanti,
Gizem Korkmaz,
Chris Kuhlman,
Achla Marathe,
Liang Zhao,
Ting Hua,
Feng Chen,
Chang-Tien Lu,
Bert Huang,
Aravind Srinivasan,
Khoa Trinh,
Lise Getoor,
Graham Katz,
Andy Doyle,
Chris Ackermann,
Ilya Zavorin,
Jim Ford
, et al. (5 additional authors not shown)
Abstract:
We describe the design, implementation, and evaluation of EMBERS, an automated, 24x7 continuous system for forecasting civil unrest across 10 countries of Latin America using open source indicators such as tweets, news sources, blogs, economic indicators, and other data sources. Unlike retrospective studies, EMBERS has been making forecasts into the future since Nov 2012 which have been (and conti…
▽ More
We describe the design, implementation, and evaluation of EMBERS, an automated, 24x7 continuous system for forecasting civil unrest across 10 countries of Latin America using open source indicators such as tweets, news sources, blogs, economic indicators, and other data sources. Unlike retrospective studies, EMBERS has been making forecasts into the future since Nov 2012 which have been (and continue to be) evaluated by an independent T&E team (MITRE). Of note, EMBERS has successfully forecast the uptick and downtick of incidents during the June 2013 protests in Brazil. We outline the system architecture of EMBERS, individual models that leverage specific data sources, and a fusion and suppression engine that supports trading off specific evaluation criteria. EMBERS also provides an audit trail interface that enables the investigation of why specific predictions were made along with the data utilized for forecasting. Through numerous evaluations, we demonstrate the superiority of EMBERS over baserate methods and its capability to forecast significant societal happenings.
△ Less
Submitted 27 February, 2014; v1 submitted 27 February, 2014;
originally announced February 2014.