Search | arXiv e-print repository

Information About Other Players in Mechanism Design

Abstract: We show the existence of mechanism design settings where the social planner has an interest in players receiving noisy signals about the types of other agents. When the social planner is interested only in partial implementation, any social choice rule that is incentive compatible after players receive additional information about other agents was originally incentive compatible prior to the chang… ▽ More We show the existence of mechanism design settings where the social planner has an interest in players receiving noisy signals about the types of other agents. When the social planner is interested only in partial implementation, any social choice rule that is incentive compatible after players receive additional information about other agents was originally incentive compatible prior to the change in information structure. However, information about other agents can eliminate undesired equilibria in an implementing mechanism. Thus, there are social choice rules which are not fully implementable in a given information environment that become fully implementable after players have additional information about the types of other agents. We provide some general conditions under which an undesired equilibrium can be eliminated by additional information about other players. △ Less

Submitted 28 May, 2024; originally announced July 2024.

Comments: undergraduate thesis at Harvard College. comments welcome!

arXiv:2403.15128 [pdf, other]

An Agent-Centric Perspective on Norm Enforcement and Sanctions

Authors: Elena Yan, Luis G. Nardin, Jomi F. Hübner, Olivier Boissier

Abstract: In increasingly autonomous and highly distributed multi-agent systems, centralized coordination becomes impractical and raises the need for governance and enforcement mechanisms from an agent-centric perspective. In our conceptual view, sanctioning norm enforcement is part of this agent-centric approach and they aim at promoting norm compliance while preserving agents' autonomy. The few works deal… ▽ More In increasingly autonomous and highly distributed multi-agent systems, centralized coordination becomes impractical and raises the need for governance and enforcement mechanisms from an agent-centric perspective. In our conceptual view, sanctioning norm enforcement is part of this agent-centric approach and they aim at promoting norm compliance while preserving agents' autonomy. The few works dealing with sanctioning norm enforcement and sanctions from the agent-centric perspective present limitations regarding the representation of sanctions and the comprehensiveness of their norm enforcement process. To address these drawbacks, we propose the NPL(s), an extension of the NPL normative programming language enriched with the representation of norms and sanctions as first-class abstractions. We also propose a BDI normative agent architecture embedding an engine for processing the NPL(s) language and a set of capabilities for approaching more comprehensively the sanctioning norm enforcement process. We apply our contributions in a case study for improving the robustness of agents' decision-making in a production automation system. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2307.00734 [pdf, other]

On the choice of training data for machine learning of geostrophic mesoscale turbulence

Authors: F. E. Yan, J. Mak, Y. Wang

Abstract: 'Data' plays a central role in data-driven methods, but is not often the subject of focus in investigations of machine learning algorithms as applied to Earth System Modeling related problems. Here we consider the case of eddy-mean interaction in rotating stratified turbulence in the presence of lateral boundaries, a problem of relevance to ocean modeling, where the eddy fluxes contain dynamically… ▽ More 'Data' plays a central role in data-driven methods, but is not often the subject of focus in investigations of machine learning algorithms as applied to Earth System Modeling related problems. Here we consider the case of eddy-mean interaction in rotating stratified turbulence in the presence of lateral boundaries, a problem of relevance to ocean modeling, where the eddy fluxes contain dynamically inert rotational components that are expected to contaminate the learning process. An often utilized choice in the literature is to learn from the divergence of the eddy fluxes. Here we provide theoretical arguments and numerical evidence that learning from the eddy fluxes with the rotational component appropriately filtered out results in models with comparable or better skill, but substantially improved robustness. If we simply want a data-driven model to have predictive skill then the choice of data choice and/or quality may not be critical, but we argue it is highly desirable and perhaps even necessary if we want to leverage data-driven methods to aid in discovering unknown or hidden physical processes within the data itself. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: 23 pages, 8 figures

arXiv:2301.09196 [pdf, ps, other]

Universality for Cokernels of Dedekind Domain Valued Random Matrices

Authors: Eric Yan

Abstract: We use the moment method of Wood to study the distribution of random finite modules over a countable Dedekind domain with finite quotients, generated by taking cokernels of random $n\times n$ matrices with entries valued in the domain. Previously, Wood found that when the entries of a random $n\times n$ integral matrix are not too concentrated modulo a prime, the asymptotic distribution (as… ▽ More We use the moment method of Wood to study the distribution of random finite modules over a countable Dedekind domain with finite quotients, generated by taking cokernels of random $n\times n$ matrices with entries valued in the domain. Previously, Wood found that when the entries of a random $n\times n$ integral matrix are not too concentrated modulo a prime, the asymptotic distribution (as $n\to\infty$) of the cokernel matches the Cohen and Lenstra conjecture on the distribution of class groups of imaginary quadratic fields. We develop and prove a condition that produces a similar universality result for random matrices with entries valued in a countable Dedekind domain with finite quotients. △ Less

Submitted 10 June, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

Comments: 14 pages, no figures

arXiv:2208.07634 [pdf, other]

doi 10.1029/2022MS003223

On constraining the mesoscale eddy energy dissipation time-scale

Authors: Julian Mak, Alexandros Avdis, Tomos W. David, Han Seul Lee, Yongsu Na, Yan Wang, Fei Er Yan

Abstract: A physically plausible lower bound on the spatially varying geostrophic mesoscale eddy energy dissipation time-scale within the ocean, related to the geographical energy transfer rate out of the geostrophic mesoscales, is provided by means of a simple and computational inexpensive inverse calculation. Data diagnosed from a high resolution global configuration ocean simulation is supplied to a para… ▽ More A physically plausible lower bound on the spatially varying geostrophic mesoscale eddy energy dissipation time-scale within the ocean, related to the geographical energy transfer rate out of the geostrophic mesoscales, is provided by means of a simple and computational inexpensive inverse calculation. Data diagnosed from a high resolution global configuration ocean simulation is supplied to a parameterized model of the geostrophic mesoscale eddy energy, from which the dissipation time-scale results as a solution to an optimization calculation. We find that the dissipation time-scale is shortest in the Southern Ocean, in the Western Boundary Currents, and on the western boundaries, consistent with the expectation that these regions are notable sites of baroclinic activity with processes leading to energy transfer out of the geostrophic mesoscales. Although our solution should be interpreted as a lower bound given the assumptions going into the calculation, it serves as an important physically consistent base line reference for further investigations into ocean energetics, as well as for an intended inference calculation that is more complete but also much more complex. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: 27 pages, 8 figures, pre-print version (with minor updates to figures to reduce file size) submitted to J. Adv. Model. Earth Syst

arXiv:2110.14819 [pdf, other]

Characterizing and Taming Resolution in Convolutional Neural Networks

Authors: Eddie Yan, Liang Luo, Luis Ceze

Abstract: Image resolution has a significant effect on the accuracy and computational, storage, and bandwidth costs of computer vision model inference. These costs are exacerbated when scaling out models to large inference serving systems and make image resolution an attractive target for optimization. However, the choice of resolution inherently introduces additional tightly coupled choices, such as image… ▽ More Image resolution has a significant effect on the accuracy and computational, storage, and bandwidth costs of computer vision model inference. These costs are exacerbated when scaling out models to large inference serving systems and make image resolution an attractive target for optimization. However, the choice of resolution inherently introduces additional tightly coupled choices, such as image crop size, image detail, and compute kernel implementation that impact computational, storage, and bandwidth costs. Further complicating this setting, the optimal choices from the perspective of these metrics are highly dependent on the dataset and problem scenario. We characterize this tradeoff space, quantitatively studying the accuracy and efficiency tradeoff via systematic and automated tuning of image resolution, image quality and convolutional neural network operators. With the insights from this study, we propose a dynamic resolution mechanism that removes the need to statically choose a resolution ahead of time. △ Less

Submitted 27 October, 2021; originally announced October 2021.

arXiv:2110.03730 [pdf, other]

doi 10.18653/v1/2021.semeval-1.28

UoB at SemEval-2021 Task 5: Extending Pre-Trained Language Models to Include Task and Domain-Specific Information for Toxic Span Prediction

Authors: Erik Yan, Harish Tayyar Madabushi

Abstract: Toxicity is pervasive in social media and poses a major threat to the health of online communities. The recent introduction of pre-trained language models, which have achieved state-of-the-art results in many NLP tasks, has transformed the way in which we approach natural language processing. However, the inherent nature of pre-training means that they are unlikely to capture task-specific statist… ▽ More Toxicity is pervasive in social media and poses a major threat to the health of online communities. The recent introduction of pre-trained language models, which have achieved state-of-the-art results in many NLP tasks, has transformed the way in which we approach natural language processing. However, the inherent nature of pre-training means that they are unlikely to capture task-specific statistical information or learn domain-specific knowledge. Additionally, most implementations of these models typically do not employ conditional random fields, a method for simultaneous token classification. We show that these modifications can improve model performance on the Toxic Spans Detection task at SemEval-2021 to achieve a score within 4 percentage points of the top performing team. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: Published in Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021); Code available at: https://github.com/erikdyan/toxic_span_detection

Journal ref: 2021.semeval-1.28 (2021) 243-248

arXiv:2109.04452 [pdf, other]

Analysis of Language Change in Collaborative Instruction Following

Authors: Anna Effenberger, Eva Yan, Rhia Singh, Alane Suhr, Yoav Artzi

Abstract: We analyze language change over time in a collaborative, goal-oriented instructional task, where utility-maximizing participants form conventions and increase their expertise. Prior work studied such scenarios mostly in the context of reference games, and consistently found that language complexity is reduced along multiple dimensions, such as utterance length, as conventions are formed. In contra… ▽ More We analyze language change over time in a collaborative, goal-oriented instructional task, where utility-maximizing participants form conventions and increase their expertise. Prior work studied such scenarios mostly in the context of reference games, and consistently found that language complexity is reduced along multiple dimensions, such as utterance length, as conventions are formed. In contrast, we find that, given the ability to increase instruction utility, instructors increase language complexity along these previously studied dimensions to better collaborate with increasingly skilled instruction followers. △ Less

Submitted 9 September, 2021; originally announced September 2021.

Comments: Findings of EMNLP 2021 Short Paper

arXiv:2101.05420 [pdf, other]

The Determinant of $\{\pm 1\}$-Matrices and Oriented Hypergraphs

Authors: Lucas J. Rusnak, Josephine Reynes, Russell Li, Eric Yan, Justin Yu

Abstract: The determinants of $\{\pm 1\}$-matrices are calculated by via the oriented hypergraphic Laplacian and summing over an incidence generalization of vertex cycle-covers. These cycle-covers are signed and partitioned into families based on their hyperedge containment. Every non-edge-monic family is shown to contribute a net value of $0$ to the Laplacian, while each edge-monic family is shown to sum t… ▽ More The determinants of $\{\pm 1\}$-matrices are calculated by via the oriented hypergraphic Laplacian and summing over an incidence generalization of vertex cycle-covers. These cycle-covers are signed and partitioned into families based on their hyperedge containment. Every non-edge-monic family is shown to contribute a net value of $0$ to the Laplacian, while each edge-monic family is shown to sum to the absolute value of the determinant of the original incidence matrix. Simple symmetries are identified as well as their relationship to Hadamard's maximum determinant problem. Finally, the entries of the incidence matrix are reclaimed using only the signs of an adjacency-minimal set of cycle-covers from an edge-monic family. △ Less

Submitted 29 June, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

Comments: 17 pages, 11 figures

MSC Class: 05C50; 05B20; 05C65; 05C22

arXiv:2005.07722 [pdf, other]

Oriented Hypergraphs: Balanceability

Authors: Lucas J. Rusnak, Selena Li, Brian Xu, Eric Yan, Shirley Zhu

Abstract: An oriented hypergraph is an oriented incidence structure that extends the concepts of signed graphs, balanced hypergraphs, and balanced matrices. We introduce hypergraphic structures and techniques that generalize the circuit classification of the signed graphic frame matroid to any oriented hypergraphic incidence matrix via its locally-signed-graphic substructure. To achieve this, Camion's algor… ▽ More An oriented hypergraph is an oriented incidence structure that extends the concepts of signed graphs, balanced hypergraphs, and balanced matrices. We introduce hypergraphic structures and techniques that generalize the circuit classification of the signed graphic frame matroid to any oriented hypergraphic incidence matrix via its locally-signed-graphic substructure. To achieve this, Camion's algorithm is applied to oriented hypergraphs to provide a generalization of reorientation sets and frustration that is only well-defined on balanceable oriented hypergraphs. A simple partial characterization of unbalanceable circuits extends the applications to representable matroids demonstrating that the difference between the Fano and non-Fano matroids is one of balance. △ Less

Submitted 15 May, 2020; originally announced May 2020.

Comments: 19 pages, 9 figures

MSC Class: 05C75 (Primary) 05C65; 05C22; 05C50; 05B35 (Secondary)

arXiv:2004.12275 [pdf]

Citation Cascade and the Evolution of Topic Relevance

Authors: Chao Min, Qingyu Chen, Erjia Yan, Yi Bu, Jianjun Sun

Abstract: Citation analysis, as a tool for quantitative studies of science, has long emphasized direct citation relations, leaving indirect or high order citations overlooked. However, a series of early and recent studies demonstrate the existence of indirect and continuous citation impact across generations. Adding to the literature on high order citations, we introduce the concept of a citation cascade: t… ▽ More Citation analysis, as a tool for quantitative studies of science, has long emphasized direct citation relations, leaving indirect or high order citations overlooked. However, a series of early and recent studies demonstrate the existence of indirect and continuous citation impact across generations. Adding to the literature on high order citations, we introduce the concept of a citation cascade: the constitution of a series of subsequent citing events initiated by a certain publication. We investigate this citation structure by analyzing more than 450,000 articles and over 6 million citation relations. We show that citation impact exists not only within the three generations documented in prior research, but also in much further generations. Still, our experimental results indicate that two to four generations are generally adequate to trace a work's scientific impact. We also explore specific structural properties such as depth, width, structural virality, and size, which account for differences among individual citation cascades. Finally, we find evidence that it is more important for a scientific work to inspire trans domain (or indirectly related domain) works than to receive only intra domain recognition in order to achieve high impact. Our methods and findings can serve as a new tool for scientific evaluation and the modeling of scientific history. △ Less

Submitted 25 April, 2020; originally announced April 2020.

arXiv:2003.08773 [pdf, other]

Do CNNs Encode Data Augmentations?

Authors: Eddie Yan, Yan** Huang

Abstract: Data augmentations are important ingredients in the recipe for training robust neural networks, especially in computer vision. A fundamental question is whether neural network features encode data augmentation transformations. To answer this question, we introduce a systematic approach to investigate which layers of neural networks are the most predictive of augmentation transformations. Our appro… ▽ More Data augmentations are important ingredients in the recipe for training robust neural networks, especially in computer vision. A fundamental question is whether neural network features encode data augmentation transformations. To answer this question, we introduce a systematic approach to investigate which layers of neural networks are the most predictive of augmentation transformations. Our approach uses features in pre-trained vision models with minimal additional processing to predict common properties transformed by augmentation (scale, aspect ratio, hue, saturation, contrast, and brightness). Surprisingly, neural network features not only predict data augmentation transformations, but they predict many transformations with high accuracy. After validating that neural networks encode features corresponding to augmentation transformations, we show that these features are encoded in the early layers of modern CNNs, though the augmentation signal fades in deeper layers. △ Less

Submitted 27 October, 2021; v1 submitted 28 February, 2020; originally announced March 2020.

MSC Class: 68T45

arXiv:1906.06039 [pdf]

doi 10.1007/s11192-019-03311-9

Nine Million Book Items and Eleven Million Citations: A Study of Book-Based Scholarly Communication Using OpenCitations

Authors: Yongjun Zhu, Erjia Yan, Silvio Peroni, Chao Che

Abstract: Books have been widely used to share information and contribute to human knowledge. However, the quantitative use of books as a method of scholarly communication is relatively unexamined compared to journal articles and conference papers. This study uses the COCI dataset (a comprehensive open citation dataset provided by OpenCitations) to explore books' roles in scholarly communication. The COCI d… ▽ More Books have been widely used to share information and contribute to human knowledge. However, the quantitative use of books as a method of scholarly communication is relatively unexamined compared to journal articles and conference papers. This study uses the COCI dataset (a comprehensive open citation dataset provided by OpenCitations) to explore books' roles in scholarly communication. The COCI data we analyzed includes 445,826,118 citations from 46,534,705 bibliographic entities. By analyzing such a large amount of data, we provide a thorough, multifaceted understanding of books. Among the investigated factors are 1) temporal changes to book citations; 2) book citation distributions; 3) years to citation peak; 4) citation half-life; and 5) characteristics of the most-cited books. Results show that books have received less than 4% of total citations, and have been cited mainly by journal articles. Moreover, 97.96% of books have been cited fewer than ten times. Books take longer than other bibliographic materials to reach peak citation levels, yet are cited for the same duration as journal articles. Most-cited books tend to cover general (yet essential) topics, theories, and technological concepts in mathematics and statistics. △ Less

Submitted 6 December, 2019; v1 submitted 14 June, 2019; originally announced June 2019.

arXiv:1901.04993 [pdf]

doi 10.1109/BigData.2017.8258000

Large-Scale Joint Topic, Sentiment & User Preference Analysis for Online Reviews

Authors: Xinli Yu, Zheng Chen, Wei-Shih Yang, Xiaohua Hu, Erjia Yan

Abstract: This paper presents a non-trivial reconstruction of a previous joint topic-sentiment-preference review model TSPRA with stick-breaking representation under the framework of variational inference (VI) and stochastic variational inference (SVI). TSPRA is a Gibbs Sampling based model that solves topics, word sentiments and user preferences altogether and has been shown to achieve good performance, bu… ▽ More This paper presents a non-trivial reconstruction of a previous joint topic-sentiment-preference review model TSPRA with stick-breaking representation under the framework of variational inference (VI) and stochastic variational inference (SVI). TSPRA is a Gibbs Sampling based model that solves topics, word sentiments and user preferences altogether and has been shown to achieve good performance, but for large data set it can only learn from a relatively small sample. We develop the variational models vTSPRA and svTSPRA to improve the time use, and our new approach is capable of processing millions of reviews. We rebuild the generative process, improve the rating regression, solve and present the coordinate-ascent updates of variational parameters, and show the time complexity of each iteration is theoretically linear to the corpus size, and the experiments on Amazon data sets show it converges faster than TSPRA and attains better results given the same amount of time. In addition, we tune svTSPRA into an online algorithm ovTSPRA that can monitor oscillations of sentiment and preference overtime. Some interesting fluctuations are captured and possible explanations are provided. The results give strong visual evidence that user preference is better treated as an independent factor from sentiment. △ Less

Submitted 14 January, 2019; originally announced January 2019.

arXiv:1812.09387 [pdf]

Correlated Anomaly Detection from Large Streaming Data

Authors: Zheng Chen, Xinli Yu, Yuan Ling, Bo Song, Wei Quan, Xiaohua Hu, Erjia Yan

Abstract: Correlated anomaly detection (CAD) from streaming data is a type of group anomaly detection and an essential task in useful real-time data mining applications like botnet detection, financial event detection, industrial process monitor, etc. The primary approach for this type of detection in previous researches is based on principal score (PS) of divided batches or sliding windows by computing top… ▽ More Correlated anomaly detection (CAD) from streaming data is a type of group anomaly detection and an essential task in useful real-time data mining applications like botnet detection, financial event detection, industrial process monitor, etc. The primary approach for this type of detection in previous researches is based on principal score (PS) of divided batches or sliding windows by computing top eigenvalues of the correlation matrix, e.g. the Lanczos algorithm. However, this paper brings up the phenomenon of principal score degeneration for large data set, and then mathematically and practically prove current PS-based methods are likely to fail for CAD on large-scale streaming data even if the number of correlated anomalies grows with the data size at a reasonable rate; in reality, anomalies tend to be the minority of the data, and this issue can be more serious. We propose a framework with two novel randomized algorithms rPS and gPS for better detection of correlated anomalies from large streaming data of various correlation strength. The experiment shows high and balanced recall and estimated accuracy of our framework for anomaly detection from a large server log data set and a U.S. stock daily price data set in comparison to direct principal score evaluation and some other recent group anomaly detection algorithms. Moreover, our techniques significantly improve the computation efficiency and scalability for principal score calculation. △ Less

Submitted 14 January, 2019; v1 submitted 19 December, 2018; originally announced December 2018.

arXiv:1812.07810 [pdf]

Fast Botnet Detection From Streaming Logs Using Online Lanczos Method

Authors: Zheng Chen, Xinli Yu, Chi Zhang, ** Zhang, Cui Lin, Bo Song, Jianliang Gao, Xiaohua Hu, Wei-Shih Yang, Erjia Yan

Abstract: Botnet, a group of coordinated bots, is becoming the main platform of malicious Internet activities like DDOS, click fraud, web scra**, spam/rumor distribution, etc. This paper focuses on design and experiment of a new approach for botnet detection from streaming web server logs, motivated by its wide applicability, real-time protection capability, ease of use and better security of sensitive da… ▽ More Botnet, a group of coordinated bots, is becoming the main platform of malicious Internet activities like DDOS, click fraud, web scra**, spam/rumor distribution, etc. This paper focuses on design and experiment of a new approach for botnet detection from streaming web server logs, motivated by its wide applicability, real-time protection capability, ease of use and better security of sensitive data. Our algorithm is inspired by a Principal Component Analysis (PCA) to capture correlation in data, and we are first to recognize and adapt Lanczos method to improve the time complexity of PCA-based botnet detection from cubic to sub-cubic, which enables us to more accurately and sensitively detect botnets with sliding time windows rather than fixed time windows. We contribute a generalized online correlation matrix update formula, and a new termination condition for Lanczos iteration for our purpose based on error bound and non-decreasing eigenvalues of symmetric matrices. On our dataset of an ecommerce website logs, experiments show the time cost of Lanczos method with different time windows are consistently only 20% to 25% of PCA. △ Less

Submitted 19 December, 2018; originally announced December 2018.

arXiv:1811.11270 [pdf]

doi 10.1016/j.joi.2019.02.007

Challenges of measuring the impact of software: an examination of the lme4 R package

Authors: Kai Li, Pei-Ying Chen, Erjia Yan

Abstract: The rise of software as a research object is mirrored in the increasing interests towards quantitative studies of scientific software. However, due to the inconsistent practice of citing software, most of the existing studies analyzing the impact of scientific software are based on identification of software name mentions in full-text publications. Despite its limitations, citation data have a muc… ▽ More The rise of software as a research object is mirrored in the increasing interests towards quantitative studies of scientific software. However, due to the inconsistent practice of citing software, most of the existing studies analyzing the impact of scientific software are based on identification of software name mentions in full-text publications. Despite its limitations, citation data have a much larger quantity and broader coverage of scientific fields than full-text data and thus could support findings in much larger scopes. This paper presents an analysis aiming to evaluate the extent to which citations data can be used to reconstruct the impact of software. Specifically, we identified the variety of citable objects related to the lme4 R package and examined how the package's impact is scattered across these objects. Our results reveal a little-discussed challenge of using citation data to measure the impact of software, that even within the category of formal citation, there might be different forms in which the same software object is cited. This challenge can be mitigated by more carefully selecting objects as the proxy of software. However, it cannot be fully solved until we have one-software-one-proxy policy for software citation. △ Less

Submitted 27 November, 2018; originally announced November 2018.

arXiv:1807.04188 [pdf, other]

A Hardware-Software Blueprint for Flexible Deep Learning Specialization

Authors: Thierry Moreau, Tianqi Chen, Luis Vega, Jared Roesch, Eddie Yan, Lianmin Zheng, Josh Fromm, Ziheng Jiang, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Abstract: Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility. Changes in algorithms, models, operators, or numerical systems threaten the viability of specialized hardware accelerators. We propose VTA, a programmable deep learning architecture templat… ▽ More Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility. Changes in algorithms, models, operators, or numerical systems threaten the viability of specialized hardware accelerators. We propose VTA, a programmable deep learning architecture template designed to be extensible in the face of evolving workloads. VTA achieves this flexibility via a parametrizable architecture, two-level ISA, and a JIT compiler. The two-level ISA is based on (1) a task-ISA that explicitly orchestrates concurrent compute and memory tasks and (2) a microcode-ISA which implements a wide variety of operators with single-cycle tensor-tensor operations. Next, we propose a runtime system equipped with a JIT compiler for flexible code-generation and heterogeneous execution that enables effective use of the VTA architecture. VTA is integrated and open-sourced into Apache TVM, a state-of-the-art deep learning compilation stack that provides flexibility for diverse models and divergent hardware backends. We propose a flow that performs design space exploration to generate a customized hardware architecture and software operator library that can be leveraged by mainstream learning frameworks. We demonstrate our approach by deploying optimized deep learning models used for object classification and style transfer on edge-class FPGAs. △ Less

Submitted 22 April, 2019; v1 submitted 11 July, 2018; originally announced July 2018.

Comments: 6 pages plus references, 8 figures

arXiv:1805.08166 [pdf, other]

Learning to Optimize Tensor Programs

Authors: Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Abstract: We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective deep learning systems. However, existing systems rely on manually optimized libraries such as cuDNN where only a narrow range of server class GPUs are well-suppor… ▽ More We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective deep learning systems. However, existing systems rely on manually optimized libraries such as cuDNN where only a narrow range of server class GPUs are well-supported. The reliance on hardware-specific operator libraries limits the applicability of high-level graph optimizations and incurs significant engineering costs when deploying to new hardware targets. We use learning to remove this engineering burden. We learn domain-specific statistical cost models to guide the search of tensor operator implementations over billions of possible program variants. We further accelerate the search by effective model transfer across workloads. Experimental results show that our framework delivers performance competitive with state-of-the-art hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPU. △ Less

Submitted 8 January, 2019; v1 submitted 21 May, 2018; originally announced May 2018.

Comments: NeurIPS 2018

arXiv:1802.04799 [pdf, other]

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

Authors: Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Abstract: There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that… ▽ More There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, map** to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies. △ Less

Submitted 5 October, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

Comments: Significantly improved version, add automated optimization

arXiv:1612.03231 [pdf]

A natural language interface to a graph-based bibliographic information retrieval system

Authors: Yongjun Zhu, Erjia Yan, Il-Yeol Song

Abstract: With the ever-increasing scientific literature, there is a need on a natural language interface to bibliographic information retrieval systems to retrieve related information effectively. In this paper, we propose a natural language interface, NLI-GIBIR, to a graph-based bibliographic information retrieval system. In designing NLI-GIBIR, we developed a novel framework that can be applicable to gra… ▽ More With the ever-increasing scientific literature, there is a need on a natural language interface to bibliographic information retrieval systems to retrieve related information effectively. In this paper, we propose a natural language interface, NLI-GIBIR, to a graph-based bibliographic information retrieval system. In designing NLI-GIBIR, we developed a novel framework that can be applicable to graph-based bibliographic information retrieval systems. Our framework integrates algorithms/heuristics for interpreting and analyzing natural language bibliographic queries. NLI-GIBIR allows users to search for a variety of bibliographic data through natural language. A series of text- and linguistic-based techniques are used to analyze and answer natural language queries, including tokenization, named entity recognition, and syntactic analysis. We find that our framework can effectively represents and addresses complex bibliographic information needs. Thus, the contributions of this paper are as follows: First, to our knowledge, it is the first attempt to propose a natural language interface to graph-based bibliographic information retrieval. Second, we propose a novel customized natural language processing framework that integrates a few original algorithms/heuristics for interpreting and analyzing natural language bibliographic queries. Third, we show that the proposed framework and natural language interface provide a practical solution in building real-world natural language interface-based bibliographic information retrieval systems. Our experimental results show that the presented system can correctly answer 39 out of 40 example natural language queries with varying lengths and complexities. △ Less

Submitted 9 December, 2016; originally announced December 2016.

arXiv:1503.06664 [pdf]

doi 10.1109/TMAG.2015.2397880

Bit Patterned Magnetic Recording: Theory, Media Fabrication, and Recording Performance

Authors: Thomas R. Albrecht, Hitesh Arora, Vipin Ayanoor-Vitikkate, Jean-Marc Beaujour, Daniel Bedau, David Berman, Alexei L. Bogdanov, Yves-Andre Chapuis, Julia Cushen, Elizabeth E. Dobisz, Gregory Doerk, He Gao, Michael Grobis, Bruce Gurney, Weldon Hanson, Olav Hellwig, Toshiki Hirano, Pierre-Olivier Jubert, Dan Kercher, Jeffrey Lille, Zuwei Liu, C. Mathew Mate, Yuri Obukhov, Kanaiyalal C. Patel, Kurt Rubin , et al. (6 additional authors not shown)

Abstract: Bit Patterned Media (BPM) for magnetic recording provide a route to densities $>1 Tb/in^2$ and circumvents many of the challenges associated with conventional granular media technology. Instead of recording a bit on an ensemble of random grains, BPM uses an array of lithographically defined isolated magnetic islands, each of which stores one bit. Fabrication of BPM is viewed as the greatest challe… ▽ More Bit Patterned Media (BPM) for magnetic recording provide a route to densities $>1 Tb/in^2$ and circumvents many of the challenges associated with conventional granular media technology. Instead of recording a bit on an ensemble of random grains, BPM uses an array of lithographically defined isolated magnetic islands, each of which stores one bit. Fabrication of BPM is viewed as the greatest challenge for its commercialization. In this article we describe a BPM fabrication method which combines e-beam lithography, directed self-assembly of block copolymers, self-aligned double patterning, nanoimprint lithography, and ion milling to generate BPM based on CoCrPt alloys. This combination of fabrication technologies achieves feature sizes of $<10 nm$, significantly smaller than what conventional semiconductor nanofabrication methods can achieve. In contrast to earlier work which used hexagonal close-packed arrays of round islands, our latest approach creates BPM with rectangular bitcells, which are advantageous for integration with existing hard disk drive technology. The advantages of rectangular bits are analyzed from a theoretical and modeling point of view, and system integration requirements such as servo patterns, implementation of write synchronization, and providing for a stable head-disk interface are addressed in the context of experimental results. Optimization of magnetic alloy materials for thermal stability, writeability, and switching field distribution is discussed, and a new method for growing BPM islands on a patterned template is presented. New recording results at $1.6 Td/in^2$ (teradot/inch${}^2$, roughly equivalent to $1.3 Tb/in^2$) demonstrate a raw error rate $<10^{-2}$, which is consistent with the recording system requirements of modern hard drives. Extendibility of BPM to higher densities, and its eventual combination with energy assisted recording are explored. △ Less

Submitted 19 March, 2015; originally announced March 2015.

Comments: 44 pages

ACM Class: B.3.2; B.4.2

arXiv:1309.2546 [pdf]

Finding knowledge paths among scientific disciplines

Authors: Erjia Yan

Abstract: This paper discovers patterns of knowledge dissemination among scientific disciplines. While the transfer of knowledge is largely unobservable, citations from one discipline to another have been proven to be an effective proxy to study disciplinary knowledge flow. This study constructs a knowledge flow network in that a node represents a Journal Citation Report subject category and a link denotes… ▽ More This paper discovers patterns of knowledge dissemination among scientific disciplines. While the transfer of knowledge is largely unobservable, citations from one discipline to another have been proven to be an effective proxy to study disciplinary knowledge flow. This study constructs a knowledge flow network in that a node represents a Journal Citation Report subject category and a link denotes the citations from one subject category to another. Using the concept of shortest path, several quantitative measurements are proposed and applied to a knowledge flow network. Based on an examination of subject categories in Journal Citation Report, this paper finds that social science domains tend to be more self-contained and thus it is more difficult for knowledge from other domains to flow into them; at the same time, knowledge from science domains, such as biomedicine-, chemistry-, and physics-related domains can access and be accessed by other domains more easily. This paper also finds that social science domains are more disunified than science domains, as three fifths of the knowledge paths from one social science domain to another need at least one science domain to serve as an intermediate. This paper contributes to discussions on disciplinarity and interdisciplinarity by providing empirical analysis. △ Less

Submitted 10 September, 2013; originally announced September 2013.

Comments: 31 pages, 12 figures

arXiv:1309.2486 [pdf]

doi 10.1371/journal.pone.0071416

Entitymetrics: Measuring the Impact of Entities

Authors: Ying Ding, Min Song, Jia Han, Qi Yu, Erjia Yan, Lili Lin, Tamy Chambers

Abstract: This paper proposes entitymetrics to measure the impact of knowledge units. Entitymetrics highlight the importance of entities embedded in scientific literature for further knowledge discovery. In this paper, we use Metformin, a drug for diabetes, as an example to form an entity-entity citation network based on literature related to Metformin. We then calculate the network features and compare the… ▽ More This paper proposes entitymetrics to measure the impact of knowledge units. Entitymetrics highlight the importance of entities embedded in scientific literature for further knowledge discovery. In this paper, we use Metformin, a drug for diabetes, as an example to form an entity-entity citation network based on literature related to Metformin. We then calculate the network features and compare the centrality ranks of biological entities with results from Comparative Toxicogenomics Database (CTD). The comparison demonstrates the usefulness of entitymetrics to detect most of the outstanding interactions manually curated in CTD. △ Less

Submitted 10 September, 2013; originally announced September 2013.

Journal ref: PLOS ONE 8(8): e71416, 2013

arXiv:1211.5820 [pdf]

A bird's-eye view of scientific trading: Dependency relations among fields of science

Authors: Erjia Yan, Ying Ding, Blaise Cronin, Loet Leydesdorff

Abstract: We use a trading metaphor to study knowledge transfer in the sciences as well as the social sciences. The metaphor comprises four dimensions: (a) Discipline Self-dependence, (b) Knowledge Exports/Imports, (c) Scientific Trading Dynamics, and (d) Scientific Trading Impact. This framework is applied to a dataset of 221 Web of Science subject categories. We find that: (i) the Scientific Trading Impac… ▽ More We use a trading metaphor to study knowledge transfer in the sciences as well as the social sciences. The metaphor comprises four dimensions: (a) Discipline Self-dependence, (b) Knowledge Exports/Imports, (c) Scientific Trading Dynamics, and (d) Scientific Trading Impact. This framework is applied to a dataset of 221 Web of Science subject categories. We find that: (i) the Scientific Trading Impact and Dynamics of Materials Science And Transportation Science have increased; (ii) Biomedical Disciplines, Physics, And Mathematics are significant knowledge exporters, as is Statistics & Probability; (iii) in the social sciences, Economics, Business, Psychology, Management, And Sociology are important knowledge exporters; (iv) Discipline Self-dependence is associated with specialized domains which have ties to professional practice (e.g., Law, Ophthalmology, Dentistry, Oral Surgery & Medicine, Psychology, Psychoanalysis, Veterinary Sciences, And Nursing). △ Less

Submitted 25 November, 2012; originally announced November 2012.

arXiv:1105.3212 [pdf]

A recursive field-normalized bibliometric performance indicator: An application to the field of library and information science

Authors: Ludo Waltman, Erjia Yan, Nees Jan van Eck

Abstract: Two commonly used ideas in the development of citation-based research performance indicators are the idea of normalizing citation counts based on a field classification scheme and the idea of recursive citation weighing (like in PageRank-inspired indicators). We combine these two ideas in a single indicator, referred to as the recursive mean normalized citation score indicator, and we study the va… ▽ More Two commonly used ideas in the development of citation-based research performance indicators are the idea of normalizing citation counts based on a field classification scheme and the idea of recursive citation weighing (like in PageRank-inspired indicators). We combine these two ideas in a single indicator, referred to as the recursive mean normalized citation score indicator, and we study the validity of this indicator. Our empirical analysis shows that the proposed indicator is highly sensitive to the field classification scheme that is used. The indicator also has a strong tendency to reinforce biases caused by the classification scheme. Based on these observations, we advise against the use of indicators in which the idea of normalization based on a field classification scheme and the idea of recursive citation weighing are combined. △ Less

Submitted 16 May, 2011; originally announced May 2011.

arXiv:1012.4876 [pdf]

Weighted citation: An indicator of an article's prestige

Authors: Erjia Yan, Ying Ding

Abstract: We propose using the technique of weighted citation to measure an article's prestige. The technique allocates a different weight to each reference by taking into account the impact of citing journals and citation time intervals. Weighted citation captures prestige, whereas citation counts capture popularity. We compare the value variances for popularity and prestige for articles published in the J… ▽ More We propose using the technique of weighted citation to measure an article's prestige. The technique allocates a different weight to each reference by taking into account the impact of citing journals and citation time intervals. Weighted citation captures prestige, whereas citation counts capture popularity. We compare the value variances for popularity and prestige for articles published in the Journal of the American Society for Information Science and Technology from 1998 to 2007, and find that the majority have comparable status. △ Less