-
Understanding Creep Suppression Mechanism in Polymer Nanocomposites through Machine Learning
Authors:
Entao Yang,
James F. Pressly,
Bharath Natarajan,
Robert Colby,
Karen I. Winey,
Robert A. Riggleman
Abstract:
While recent efforts have shown how local structure plays an essential role in the dynamic heterogeneity of homogeneous glass-forming materials, systems containing interfaces such as thin films or composite materials remain poorly understood. It is known that interfaces perturb the molecular packing nearby, however, numerous studies show the dynamics are modified over a much larger range. Here, we…
▽ More
While recent efforts have shown how local structure plays an essential role in the dynamic heterogeneity of homogeneous glass-forming materials, systems containing interfaces such as thin films or composite materials remain poorly understood. It is known that interfaces perturb the molecular packing nearby, however, numerous studies show the dynamics are modified over a much larger range. Here, we examine the dynamics in polymer nanocomposites (PNCs) using a combination of simulations and experiments and quantitatively separate the role of polymer packing from other effects on the dynamics, as a function of distance from the nanoparticle surfaces. After showing good qualitative agreement between the simulations and experiments in glassy structure and creep compliance, we use a recently developed machine learning technique to decompose polymer dynamics in our simulated PNCs into structure-dependent and structure-independent processes. With this decomposition, the free energy barrier for polymer rearrangement can be described as a combination of packing-dependent and packing-independent barriers. We find both barriers are higher near nanoparticles and decrease with applied stress, quantitatively demonstrating that the slow interfacial dynamics is not solely due to polymer packing differences, but also the change of structure-dynamics relationships. Finally, we present how this decomposition can be used to accurately predict strain-time creep curves for PNCs from their static configuration, providing additional insights into the effects of polymer-nanoparticle interfaces on creep suppression in PNCs.
△ Less
Submitted 25 April, 2022;
originally announced April 2022.
-
C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval
Authors:
Eugene Yang,
Suraj Nair,
Ramraj Chandradevan,
Rebecca Iglesias-Flores,
Douglas W. Oard
Abstract:
Pretrained language models have improved effectiveness on numerous tasks, including ad-hoc retrieval. Recent work has shown that continuing to pretrain a language model with auxiliary objectives before fine-tuning on the retrieval task can further improve retrieval effectiveness. Unlike monolingual retrieval, designing an appropriate auxiliary task for cross-language map**s is challenging. To ad…
▽ More
Pretrained language models have improved effectiveness on numerous tasks, including ad-hoc retrieval. Recent work has shown that continuing to pretrain a language model with auxiliary objectives before fine-tuning on the retrieval task can further improve retrieval effectiveness. Unlike monolingual retrieval, designing an appropriate auxiliary task for cross-language map**s is challenging. To address this challenge, we use comparable Wikipedia articles in different languages to further pretrain off-the-shelf multilingual pretrained models before fine-tuning on the retrieval task. We show that our approach yields improvements in retrieval effectiveness.
△ Less
Submitted 25 April, 2022;
originally announced April 2022.
-
LitMind Dictionary: An Open-Source Online Dictionary
Authors:
Cunliang Kong,
Xuezhi Fang,
Liner Yang,
Yun Chen,
Erhong Yang
Abstract:
Dictionaries can help language learners to learn vocabulary by providing definitions of words. Since traditional dictionaries present word senses as discrete items in predefined inventories, they fall short of flexibility, which is required in providing specific meanings of words in particular contexts. In this paper, we introduce the LitMind Dictionary (https://dictionary.litmind.ink), an open-so…
▽ More
Dictionaries can help language learners to learn vocabulary by providing definitions of words. Since traditional dictionaries present word senses as discrete items in predefined inventories, they fall short of flexibility, which is required in providing specific meanings of words in particular contexts. In this paper, we introduce the LitMind Dictionary (https://dictionary.litmind.ink), an open-source online generative dictionary that takes a word and context containing the word as input and automatically generates a definition as output. Incorporating state-of-the-art definition generation models, it supports not only Chinese and English, but also Chinese-English cross-lingual queries. Moreover, it has a user-friendly front-end design that can help users understand the query words quickly and easily. All the code and data are available at https://github.com/blcuicall/litmind-dictionary.
△ Less
Submitted 23 April, 2022;
originally announced April 2022.
-
BLCU-ICALL at SemEval-2022 Task 1: Cross-Attention Multitasking Framework for Definition Modeling
Authors:
Cunliang Kong,
Yujie Wang,
Ruining Chong,
Liner Yang,
Hengyuan Zhang,
Erhong Yang,
Ya** Huang
Abstract:
This paper describes the BLCU-ICALL system used in the SemEval-2022 Task 1 Comparing Dictionaries and Word Embeddings, the Definition Modeling subtrack, achieving 1st on Italian, 2nd on Spanish and Russian, and 3rd on English and French. We propose a transformer-based multitasking framework to explore the task. The framework integrates multiple embedding architectures through the cross-attention m…
▽ More
This paper describes the BLCU-ICALL system used in the SemEval-2022 Task 1 Comparing Dictionaries and Word Embeddings, the Definition Modeling subtrack, achieving 1st on Italian, 2nd on Spanish and Russian, and 3rd on English and French. We propose a transformer-based multitasking framework to explore the task. The framework integrates multiple embedding architectures through the cross-attention mechanism, and captures the structure of glosses through a masking language model objective. Additionally, we also investigate a simple but effective model ensembling strategy to further improve the robustness. The evaluation results show the effectiveness of our solution. We release our code at: https://github.com/blcuicall/SemEval2022-Task1-DM.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
-
Multitasking Framework for Unsupervised Simple Definition Generation
Authors:
Cunliang Kong,
Yun Chen,
Hengyuan Zhang,
Liner Yang,
Erhong Yang
Abstract:
The definition generation task can help language learners by providing explanations for unfamiliar words. This task has attracted much attention in recent years. We propose a novel task of Simple Definition Generation (SDG) to help language learners and low literacy readers. A significant challenge of this task is the lack of learner's dictionaries in many languages, and therefore the lack of data…
▽ More
The definition generation task can help language learners by providing explanations for unfamiliar words. This task has attracted much attention in recent years. We propose a novel task of Simple Definition Generation (SDG) to help language learners and low literacy readers. A significant challenge of this task is the lack of learner's dictionaries in many languages, and therefore the lack of data for supervised training. We explore this task and propose a multitasking framework SimpDefiner that only requires a standard dictionary with complex definitions and a corpus containing arbitrary simple texts. We disentangle the complexity factors from the text by carefully designing a parameter sharing scheme between two decoders. By jointly training these components, the framework can generate both complex and simple definitions simultaneously. We demonstrate that the framework can generate relevant, simple definitions for the target words through automatic and manual evaluations on English and Chinese datasets. Our method outperforms the baseline model by a 1.77 SARI score on the English dataset, and raises the proportion of the low level (HSK level 1-3) words in Chinese definitions by 3.87%.
△ Less
Submitted 24 March, 2022;
originally announced March 2022.
-
SLAM-Supported Self-Training for 6D Object Pose Estimation
Authors:
Ziqi Lu,
Yihao Zhang,
Kevin Doherty,
Odin Severinsen,
Ethan Yang,
John Leonard
Abstract:
Recent progress in object pose prediction provides a promising path for robots to build object-level scene representations during navigation. However, as we deploy a robot in novel environments, the out-of-distribution data can degrade the prediction performance. To mitigate the domain gap, we can potentially perform self-training in the target domain, using predictions on robot-captured images as…
▽ More
Recent progress in object pose prediction provides a promising path for robots to build object-level scene representations during navigation. However, as we deploy a robot in novel environments, the out-of-distribution data can degrade the prediction performance. To mitigate the domain gap, we can potentially perform self-training in the target domain, using predictions on robot-captured images as pseudo labels to fine-tune the object pose estimator. Unfortunately, the pose predictions are typically outlier-corrupted, and it is hard to quantify their uncertainties, which can result in low-quality pseudo-labeled data. To address the problem, we propose a SLAM-supported self-training method, leveraging robot understanding of the 3D scene geometry to enhance the object pose inference performance. Combining the pose predictions with robot odometry, we formulate and solve pose graph optimization to refine the object pose estimates and make pseudo labels more consistent across frames. We incorporate the pose prediction covariances as variables into the optimization to automatically model their uncertainties. This automatic covariance tuning (ACT) process can fit 6D pose prediction noise at the component level, leading to higher-quality pseudo training data. We test our method with the deep object pose estimator (DOPE) on the YCB video dataset and in real robot experiments. It achieves respectively 34.3% and 17.8% accuracy enhancements in pose prediction on the two tests. Our code is available at https://github.com/520xyxyzq/slam-super-6d.
△ Less
Submitted 15 August, 2022; v1 submitted 8 March, 2022;
originally announced March 2022.
-
TARexp: A Python Framework for Technology-Assisted Review Experiments
Authors:
Eugene Yang,
David D. Lewis
Abstract:
Technology-assisted review (TAR) is an important industrial application of information retrieval (IR) and machine learning (ML). While a small TAR research community exists, the complexity of TAR software and workflows is a major barrier to entry. Drawing on past open source TAR efforts, as well as design patterns from the IR and ML open source software, we present an open source Python framework…
▽ More
Technology-assisted review (TAR) is an important industrial application of information retrieval (IR) and machine learning (ML). While a small TAR research community exists, the complexity of TAR software and workflows is a major barrier to entry. Drawing on past open source TAR efforts, as well as design patterns from the IR and ML open source software, we present an open source Python framework for conducting experiments on TAR algorithms. Key characteristics of this framework are declarative representations of workflows and experiment plans, the ability for components to play variable numbers of workflow roles, and state maintenance and restart capabilities. Users can draw on reference implementations of standard TAR algorithms while incorporating novel components to explore their research interests. The framework is available at https://github.com/eugene-yang/tarexp.
△ Less
Submitted 24 April, 2022; v1 submitted 23 February, 2022;
originally announced February 2022.
-
Patapasco: A Python Framework for Cross-Language Information Retrieval Experiments
Authors:
Cash Costello,
Eugene Yang,
Dawn Lawrie,
James Mayfield
Abstract:
While there are high-quality software frameworks for information retrieval experimentation, they do not explicitly support cross-language information retrieval (CLIR). To fill this gap, we have created Patapsco, a Python CLIR framework. This framework specifically addresses the complexity that comes with running experiments in multiple languages. Patapsco is designed to be extensible to many langu…
▽ More
While there are high-quality software frameworks for information retrieval experimentation, they do not explicitly support cross-language information retrieval (CLIR). To fill this gap, we have created Patapsco, a Python CLIR framework. This framework specifically addresses the complexity that comes with running experiments in multiple languages. Patapsco is designed to be extensible to many language pairs, to be scalable to large document collections, and to support reproducible experiments driven by a configuration file. We include Patapsco results on standard CLIR collections using multiple settings.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
HC4: A New Suite of Test Collections for Ad Hoc CLIR
Authors:
Dawn Lawrie,
James Mayfield,
Douglas Oard,
Eugene Yang
Abstract:
HC4 is a new suite of test collections for ad hoc Cross-Language Information Retrieval (CLIR), with Common Crawl News documents in Chinese, Persian, and Russian, topics in English and in the document languages, and graded relevance judgments. New test collections are needed because existing CLIR test collections built using pooling of traditional CLIR runs have systematic gaps in their relevance j…
▽ More
HC4 is a new suite of test collections for ad hoc Cross-Language Information Retrieval (CLIR), with Common Crawl News documents in Chinese, Persian, and Russian, topics in English and in the document languages, and graded relevance judgments. New test collections are needed because existing CLIR test collections built using pooling of traditional CLIR runs have systematic gaps in their relevance judgments when used to evaluate neural CLIR methods. The HC4 collections contain 60 topics and about half a million documents for each of Chinese and Persian, and 54 topics and five million documents for Russian. Active learning was used to determine which documents to annotate after being seeded using interactive search and judgment. Documents were judged on a three-grade relevance scale. This paper describes the design and construction of the new test collections and provides baseline results for demonstrating their utility for evaluating systems.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Trireme: Exploring Hierarchical Multi-Level Parallelism for Domain Specific Hardware Acceleration
Authors:
Georgios Zacharopoulos,
Adel Ejjeh,
Ying **g,
En-Yu Yang,
Tianyu Jia,
Iulian Brumar,
Jeremy Intan,
Muhammad Huzaifa,
Sarita Adve,
Vikram Adve,
Gu-Yeon Wei,
David Brooks
Abstract:
The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts of an application to accelerate in hardware and which to leave in software. Moreover, applications in domains such as Extended Reality (XR) offer opportunities for various forms of parallel execution…
▽ More
The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts of an application to accelerate in hardware and which to leave in software. Moreover, applications in domains such as Extended Reality (XR) offer opportunities for various forms of parallel execution, including loop level, task level and pipeline parallelism. To assist the design process and expose every possible level of parallelism, we present Trireme, a fully automated tool-chain that explores multiple levels of parallelism and produces domain specific accelerator designs and configurations that maximize performance, given an area budget. Experiments on demanding benchmarks from the XR domain revealed a speedup of up to 20x, as well as a speedup of up to 37x for smaller applications, compared to software-only implementations.
△ Less
Submitted 21 January, 2022;
originally announced January 2022.
-
Transfer Learning Approaches for Building Cross-Language Dense Retrieval Models
Authors:
Suraj Nair,
Eugene Yang,
Dawn Lawrie,
Kevin Duh,
Paul McNamee,
Kenton Murray,
James Mayfield,
Douglas W. Oard
Abstract:
The advent of transformer-based models such as BERT has led to the rise of neural ranking models. These models have improved the effectiveness of retrieval systems well beyond that of lexical term matching models such as BM25. While monolingual retrieval tasks have benefited from large-scale training collections such as MS MARCO and advances in neural architectures, cross-language retrieval tasks…
▽ More
The advent of transformer-based models such as BERT has led to the rise of neural ranking models. These models have improved the effectiveness of retrieval systems well beyond that of lexical term matching models such as BM25. While monolingual retrieval tasks have benefited from large-scale training collections such as MS MARCO and advances in neural architectures, cross-language retrieval tasks have fallen behind these advancements. This paper introduces ColBERT-X, a generalization of the ColBERT multi-representation dense retrieval model that uses the XLM-RoBERTa (XLM-R) encoder to support cross-language information retrieval (CLIR). ColBERT-X can be trained in two ways. In zero-shot training, the system is trained on the English MS MARCO collection, relying on the XLM-R encoder for cross-language map**s. In translate-train, the system is trained on the MS MARCO English queries coupled with machine translations of the associated MS MARCO passages. Results on ad hoc document ranking tasks in several languages demonstrate substantial and statistically significant improvements of these trained dense retrieval models over traditional lexical CLIR baselines.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
Approximation Algorithms for Maximum Matchings in Geometric Intersection Graphs
Authors:
Sariel Har-Peled,
Everett Yang
Abstract:
We present a $(1- \varepsilon)$-approximation algorithms for maximum cardinality matchings in disk intersection graphs -- all with near linear running time. We also present estimation algorithm that returns $(1\pm \varepsilon)$-approximation to the size of such matchings -- this algorithms run in linear time for unit disks, and $O(n \log n)$ for general disks (as long as the density is relatively…
▽ More
We present a $(1- \varepsilon)$-approximation algorithms for maximum cardinality matchings in disk intersection graphs -- all with near linear running time. We also present estimation algorithm that returns $(1\pm \varepsilon)$-approximation to the size of such matchings -- this algorithms run in linear time for unit disks, and $O(n \log n)$ for general disks (as long as the density is relatively small).
△ Less
Submitted 15 March, 2022; v1 submitted 5 January, 2022;
originally announced January 2022.
-
YACLC: A Chinese Learner Corpus with Multidimensional Annotation
Authors:
Yingying Wang,
Cunliang Kong,
Liner Yang,
Yijun Wang,
Xiaorong Lu,
Renfen Hu,
Shan He,
Zhenghao Liu,
Yun Chen,
Erhong Yang,
Maosong Sun
Abstract:
Learner corpus collects language data produced by L2 learners, that is second or foreign-language learners. This resource is of great relevance for second language acquisition research, foreign-language teaching, and automatic grammatical error correction. However, there is little focus on learner corpus for Chinese as Foreign Language (CFL) learners. Therefore, we propose to construct a large-sca…
▽ More
Learner corpus collects language data produced by L2 learners, that is second or foreign-language learners. This resource is of great relevance for second language acquisition research, foreign-language teaching, and automatic grammatical error correction. However, there is little focus on learner corpus for Chinese as Foreign Language (CFL) learners. Therefore, we propose to construct a large-scale, multidimensional annotated Chinese learner corpus. To construct the corpus, we first obtain a large number of topic-rich texts generated by CFL learners. Then we design an annotation scheme including a sentence acceptability score as well as grammatical error and fluency-based corrections. We build a crowdsourcing platform to perform the annotation effectively (https://yaclc.wenmind.net). We name the corpus YACLC (Yet Another Chinese Learner Corpus) and release it as part of the CUGE benchmark (http://cuge.baai.ac.cn). By analyzing the original sentences and annotations in the corpus, we found that YACLC has a considerable size and very high annotation quality. We hope this corpus can further enhance the studies on Chinese International Education and Chinese automatic grammatical error correction.
△ Less
Submitted 30 December, 2021;
originally announced December 2021.
-
CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark
Authors:
Yuan Yao,
Qingxiu Dong,
Jian Guan,
Boxi Cao,
Zhengyan Zhang,
Chaojun Xiao,
Xiaozhi Wang,
Fanchao Qi,
Junwei Bao,
**ran Nie,
Zheni Zeng,
Yuxian Gu,
Kun Zhou,
Xuancheng Huang,
Wenhao Li,
Shuhuai Ren,
**liang Lu,
Chengqiang Xu,
Huadong Wang,
Guoyang Zeng,
Zile Zhou,
Jiajun Zhang,
Juanzi Li,
Minlie Huang,
Rui Yan
, et al. (10 additional authors not shown)
Abstract:
Realizing general-purpose language intelligence has been a longstanding goal for natural language processing, where standard evaluation benchmarks play a fundamental and guiding role. We argue that for general-purpose language intelligence evaluation, the benchmark itself needs to be comprehensive and systematic. To this end, we propose CUGE, a Chinese Language Understanding and Generation Evaluat…
▽ More
Realizing general-purpose language intelligence has been a longstanding goal for natural language processing, where standard evaluation benchmarks play a fundamental and guiding role. We argue that for general-purpose language intelligence evaluation, the benchmark itself needs to be comprehensive and systematic. To this end, we propose CUGE, a Chinese Language Understanding and Generation Evaluation benchmark with the following features: (1) Hierarchical benchmark framework, where datasets are principally selected and organized with a language capability-task-dataset hierarchy. (2) Multi-level scoring strategy, where different levels of model performance are provided based on the hierarchical framework. To facilitate CUGE, we provide a public leaderboard that can be customized to support flexible model judging criteria. Evaluation results on representative pre-trained language models indicate ample room for improvement towards general-purpose language intelligence. CUGE is publicly available at cuge.baai.ac.cn.
△ Less
Submitted 14 June, 2022; v1 submitted 27 December, 2021;
originally announced December 2021.
-
Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated Label Mixing
Authors:
Joonhyung Park,
June Yong Yang,
**woo Shin,
Sung Ju Hwang,
Eunho Yang
Abstract:
The Mixup scheme suggests mixing a pair of samples to create an augmented training sample and has gained considerable attention recently for improving the generalizability of neural networks. A straightforward and widely used extension of Mixup is to combine with regional dropout-like methods: removing random patches from a sample and replacing it with the features from another sample. Albeit thei…
▽ More
The Mixup scheme suggests mixing a pair of samples to create an augmented training sample and has gained considerable attention recently for improving the generalizability of neural networks. A straightforward and widely used extension of Mixup is to combine with regional dropout-like methods: removing random patches from a sample and replacing it with the features from another sample. Albeit their simplicity and effectiveness, these methods are prone to create harmful samples due to their randomness. To address this issue, 'maximum saliency' strategies were recently proposed: they select only the most informative features to prevent such a phenomenon. However, they now suffer from lack of sample diversification as they always deterministically select regions with maximum saliency, injecting bias into the augmented data. In this paper, we present, a novel, yet simple Mixup-variant that captures the best of both worlds. Our idea is two-fold. By stochastically sampling the features and 'grafting' them onto another sample, our method effectively generates diverse yet meaningful samples. Its second ingredient is to produce the label of the grafted sample by mixing the labels in a saliency-calibrated fashion, which rectifies supervision misguidance introduced by the random sampling procedure. Our experiments under CIFAR, Tiny-ImageNet, and ImageNet datasets show that our scheme outperforms the current state-of-the-art augmentation strategies not only in terms of classification accuracy, but is also superior in co** under stress conditions such as data corruption and object occlusion.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
ActiveZero: Mixed Domain Learning for Active Stereovision with Zero Annotation
Authors:
Isabella Liu,
Edward Yang,
Jianyu Tao,
Rui Chen,
Xiaoshuai Zhang,
Qing Ran,
Zhu Liu,
Hao Su
Abstract:
Traditional depth sensors generate accurate real world depth estimates that surpass even the most advanced learning approaches trained only on simulation domains. Since ground truth depth is readily available in the simulation domain but quite difficult to obtain in the real domain, we propose a method that leverages the best of both worlds. In this paper we present a new framework, ActiveZero, wh…
▽ More
Traditional depth sensors generate accurate real world depth estimates that surpass even the most advanced learning approaches trained only on simulation domains. Since ground truth depth is readily available in the simulation domain but quite difficult to obtain in the real domain, we propose a method that leverages the best of both worlds. In this paper we present a new framework, ActiveZero, which is a mixed domain learning solution for active stereovision systems that requires no real world depth annotation. First, we demonstrate the transferability of our method to out-of-distribution real data by using a mixed domain learning strategy. In the simulation domain, we use a combination of supervised disparity loss and self-supervised losses on a shape primitives dataset. By contrast, in the real domain, we only use self-supervised losses on a dataset that is out-of-distribution from either training simulation data or test real data. Second, our method introduces a novel self-supervised loss called temporal IR reprojection to increase the robustness and accuracy of our reprojections in hard-to-perceive regions. Finally, we show how the method can be trained end-to-end and that each module is important for attaining the end result. Extensive qualitative and quantitative evaluations on real data demonstrate state of the art results that can even beat a commercial depth sensor.
△ Less
Submitted 5 December, 2021;
originally announced December 2021.
-
Fighting Fire with Fire: Contrastive Debiasing without Bias-free Data via Generative Bias-transformation
Authors:
Yeonsung Jung,
Ha** Shim,
June Yong Yang,
Eunho Yang
Abstract:
Deep neural networks (DNNs), despite their impressive ability to generalize over-capacity networks, often rely heavily on malignant bias as shortcuts instead of task-related information for discriminative tasks. To address this problem, recent studies utilize auxiliary information related to the bias, which is rarely obtainable in practice, or sift through a handful of bias-free samples for debias…
▽ More
Deep neural networks (DNNs), despite their impressive ability to generalize over-capacity networks, often rely heavily on malignant bias as shortcuts instead of task-related information for discriminative tasks. To address this problem, recent studies utilize auxiliary information related to the bias, which is rarely obtainable in practice, or sift through a handful of bias-free samples for debiasing. However, the success of these methods is not always guaranteed due to the unfulfilled presumptions. In this paper, we propose a novel method, Contrastive Debiasing via Generative Bias-transformation (CDvG), which works without explicit bias labels or bias-free samples. Motivated by our observation that not only discriminative models but also image translation models tend to focus on the malignant bias, CDvG employs an image translation model to transform one bias mode into another while preserving the task-relevant information. Additionally, the bias-transformed views are set against each other through contrastive learning to learn bias-invariant representations. Our method demonstrates superior performance compared to prior approaches, especially when bias-free samples are scarce or absent. Furthermore, CDvG can be integrated with the methods that focus on bias-free samples in a plug-and-play manner for additional enhancements, as demonstrated by diverse experimental results.
△ Less
Submitted 5 July, 2023; v1 submitted 2 December, 2021;
originally announced December 2021.
-
BICEP Array: 150 GHz detector module development
Authors:
A. Schillaci,
P. A. R. Ade,
Z. Ahmed,
M. Amiri,
D. Barkats,
R. Basu Thakur,
C. A. Bischoff,
D. Beck,
J. J. Bock,
V. Buza,
J. Cheshire,
J. Connors,
J. Cornelison,
M. Crumrine,
A. Cukierman,
E. Denison,
M. Dierickx,
L. Duband,
M. Eiben,
S. Fatigoni,
J. P. Filippini,
C. Giannakopoulos,
N. Goeckner-Wald,
D. Goldfinger,
J. A. Grayson
, et al. (59 additional authors not shown)
Abstract:
The BICEP/Keck Collaboration is currently leading the quest to the highest sensitivity measurements of the polarized CMB anisotropies on degree scale with a series of cryogenic telescopes, of which BICEP Array is the latest Stage-3 upgrade with a total of $\sim32,000$ detectors. The instrument comprises 4 receivers spanning 30 to 270 GHz, with the low-frequency 30/40 GHz deployed to the South Pole…
▽ More
The BICEP/Keck Collaboration is currently leading the quest to the highest sensitivity measurements of the polarized CMB anisotropies on degree scale with a series of cryogenic telescopes, of which BICEP Array is the latest Stage-3 upgrade with a total of $\sim32,000$ detectors. The instrument comprises 4 receivers spanning 30 to 270 GHz, with the low-frequency 30/40 GHz deployed to the South Pole Station in late 2019. The full complement of receivers is forecast to set the most stringent constraints on the tensor to scalar ratio $r$. Building on these advances, the overarching small-aperture telescope concept is already being used as the reference for further Stage-4 experiment design.
In this paper I will present the development of the BICEP Array 150 GHz detector module and its fabrication requirements, with highlights on the high-density time division multiplexing (TDM) design of the cryogenic circuit boards. The low-impedance wiring required between the detectors and the first-stage SQUID amplifiers is crucial to maintain a stiff voltage bias on the detectors. A novel multi-layer FR4 Printed Circuit Board (PCB) with superconducting traces, capable of reading out up to 648 detectors, is presented along with its validation tests.
I will also describe an ultra-high density TDM detector module we developed for a CMB-S4-like experiment that allows up to 1,920 detectors to be read out. TDM has been chosen as the detector readout technology for the Cosmic Microwave Background Stage-4 (CMB-S4) experiment based on its proven low-noise performance, predictable costs and overall maturity of the architecture. The heritage for TDM is rooted in mm- and submm-wave experiments dating back 20 years and has since evolved to support a multiplexing factor of 64x in Stage-3 experiments.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
Testing thresholds for high-dimensional sparse random geometric graphs
Authors:
Siqi Liu,
Sidhanth Mohanty,
Tselil Schramm,
Elizabeth Yang
Abstract:
In the random geometric graph model $\mathsf{Geo}_d(n,p)$, we identify each of our $n$ vertices with an independently and uniformly sampled vector from the $d$-dimensional unit sphere, and we connect pairs of vertices whose vectors are ``sufficiently close'', such that the marginal probability of an edge is $p$.
We investigate the problem of testing for this latent geometry, or in other words, d…
▽ More
In the random geometric graph model $\mathsf{Geo}_d(n,p)$, we identify each of our $n$ vertices with an independently and uniformly sampled vector from the $d$-dimensional unit sphere, and we connect pairs of vertices whose vectors are ``sufficiently close'', such that the marginal probability of an edge is $p$.
We investigate the problem of testing for this latent geometry, or in other words, distinguishing an Erdős-Rényi graph $\mathsf{G}(n, p)$ from a random geometric graph $\mathsf{Geo}_d(n, p)$. It is not too difficult to show that if $d\to \infty$ while $n$ is held fixed, the two distributions become indistinguishable; we wish to understand how fast $d$ must grow as a function of $n$ for indistinguishability to occur.
When $p = \fracα{n}$ for constant $α$, we prove that if $d \ge \mathrm{polylog} n$, the total variation distance between the two distributions is close to $0$; this improves upon the best previous bound of Brennan, Bresler, and Nagaraj (2020), which required $d \gg n^{3/2}$, and further our result is nearly tight, resolving a conjecture of Bubeck, Ding, Eldan, \& Rácz (2016) up to logarithmic factors. We also obtain improved upper bounds on the statistical indistinguishability thresholds in $d$ for the full range of $p$ satisfying $\frac{1}{n}\le p\le \frac{1}{2}$, improving upon the previous bounds by polynomial factors.
Our analysis uses the Belief Propagation algorithm to characterize the distributions of (subsets of) the random vectors {\em conditioned on producing a particular graph}. In this sense, our analysis is connected to the ``cavity method'' from statistical physics. To analyze this process, we rely on novel sharp estimates for the area of the intersection of a random sphere cap with an arbitrary subset of the sphere, which we prove using optimal transport maps and entropy-transport inequalities on the unit sphere.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
Radiative Pattern of Intralayer and Interlayer Excitons in Two-Dimensional WS2/WSe2 Heterostructure
Authors:
Mohammed Adel Aly,
Manan Shah,
Lorenz Maximilian Schneider,
Kyungnam Kang,
Martin Koch,
Eui-Hyeok Yang,
Arash Rahimi-Iman
Abstract:
Two-dimensional (2D) heterostructures (HS) formed by transition-metal dichalcogenide (TMDC) monolayers offer a unique platform for the study of intralayer and interlayer excitons as well as moiré-pattern-induced features. Particularly, the dipolar charge-transfer exciton comprising an electron and a hole, which are confined to separate layers of 2D semiconductors and Coulomb-bound across the heter…
▽ More
Two-dimensional (2D) heterostructures (HS) formed by transition-metal dichalcogenide (TMDC) monolayers offer a unique platform for the study of intralayer and interlayer excitons as well as moiré-pattern-induced features. Particularly, the dipolar charge-transfer exciton comprising an electron and a hole, which are confined to separate layers of 2D semiconductors and Coulomb-bound across the heterojunction interface, has drawn considerable attention in the research community. On the one hand, it bears significance for optoelectronic devices, e.g. in terms of charge carrier extraction from photovoltaic devices. On the other hand, its spatially indirect nature and correspondingly high longevity among excitons as well as its out-of-plane dipole orientation render it attractive for excitonic Bose-Einstein condensation studies, which address collective coherence effects, and for photonic integration schemes with TMDCs. Here, we demonstrate the interlayer excitons' out-of-plane dipole orientation through angle-resolved spectroscopy of the HS photoluminescence at cryogenic temperatures, employing a tungsten-based TMDC HS. Within the measurable light cone, the directly-obtained radiation profile of this species clearly resembles that of an in-plane emitter which deviates from that of the intralayer bright excitons as well as the other excitonic HS features recently attributed to artificial superlattices formed by moiré patterns.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
Graph Transplant: Node Saliency-Guided Graph Mixup with Local Structure Preservation
Authors:
Joonhyung Park,
Ha** Shim,
Eunho Yang
Abstract:
Graph-structured datasets usually have irregular graph sizes and connectivities, rendering the use of recent data augmentation techniques, such as Mixup, difficult. To tackle this challenge, we present the first Mixup-like graph augmentation method at the graph-level called Graph Transplant, which mixes irregular graphs in data space. To be well defined on various scales of the graph, our method i…
▽ More
Graph-structured datasets usually have irregular graph sizes and connectivities, rendering the use of recent data augmentation techniques, such as Mixup, difficult. To tackle this challenge, we present the first Mixup-like graph augmentation method at the graph-level called Graph Transplant, which mixes irregular graphs in data space. To be well defined on various scales of the graph, our method identifies the sub-structure as a mix unit that can preserve the local information. Since the mixup-based methods without special consideration of the context are prone to generate noisy samples, our method explicitly employs the node saliency information to select meaningful subgraphs and adaptively determine the labels. We extensively validate our method with diverse GNN architectures on multiple graph classification benchmark datasets from a wide range of graph domains of different sizes. Experimental results show the consistent superiority of our method over other basic data augmentation baselines. We also demonstrate that Graph Transplant enhances the performance in terms of robustness and model calibration.
△ Less
Submitted 19 December, 2021; v1 submitted 10 November, 2021;
originally announced November 2021.
-
Deep Convolution Network Based Emotion Analysis for Automatic Detection of Mild Cognitive Impairment in the Elderly
Authors:
Zixiang Fei,
Erfu Yang,
Leijian Yu,
Xia Li,
Huiyu Zhou,
Wenju Zhou
Abstract:
A significant number of people are suffering from cognitive impairment all over the world. Early detection of cognitive impairment is of great importance to both patients and caregivers. However, existing approaches have their shortages, such as time consumption and financial expenses involved in clinics and the neuroimaging stage. It has been found that patients with cognitive impairment show abn…
▽ More
A significant number of people are suffering from cognitive impairment all over the world. Early detection of cognitive impairment is of great importance to both patients and caregivers. However, existing approaches have their shortages, such as time consumption and financial expenses involved in clinics and the neuroimaging stage. It has been found that patients with cognitive impairment show abnormal emotion patterns. In this paper, we present a novel deep convolution network-based system to detect the cognitive impairment through the analysis of the evolution of facial emotions while participants are watching designed video stimuli. In our proposed system, a novel facial expression recognition algorithm is developed using layers from MobileNet and Support Vector Machine (SVM), which showed satisfactory performance in 3 datasets. To verify the proposed system in detecting cognitive impairment, 61 elderly people including patients with cognitive impairment and healthy people as a control group have been invited to participate in the experiments and a dataset was built accordingly. With this dataset, the proposed system has successfully achieved the detection accuracy of 73.3%.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
TorchAudio: Building Blocks for Audio and Speech Processing
Authors:
Yao-Yuan Yang,
Moto Hira,
Zhaoheng Ni,
Anjali Chourdia,
Artyom Astafurov,
Caroline Chen,
Ching-Feng Yeh,
Christian Puhrsch,
David Pollack,
Dmitriy Genzel,
Donny Greenberg,
Edward Z. Yang,
Jason Lian,
Jay Mahadeokar,
Jeff Hwang,
Ji Chen,
Peter Goldsborough,
Prabhat Roy,
Sean Narenthiran,
Shinji Watanabe,
Soumith Chintala,
Vincent Quenneville-Bélair,
Yangyang Shi
Abstract:
This document describes version 0.10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain. The objective of TorchAudio is to accelerate the development and deployment of machine learning applications for researchers and engineers by providing off-the-shelf building blocks. The building blocks are designed to be GPU-compatible, automatically dif…
▽ More
This document describes version 0.10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain. The objective of TorchAudio is to accelerate the development and deployment of machine learning applications for researchers and engineers by providing off-the-shelf building blocks. The building blocks are designed to be GPU-compatible, automatically differentiable, and production-ready. TorchAudio can be easily installed from Python Package Index repository and the source code is publicly available under a BSD-2-Clause License (as of September 2021) at https://github.com/pytorch/audio. In this document, we provide an overview of the design principles, functionalities, and benchmarks of TorchAudio. We also benchmark our implementation of several audio and speech operations and models. We verify through the benchmarks that our implementations of various operations and models are valid and perform similarly to other publicly available implementations.
△ Less
Submitted 16 February, 2022; v1 submitted 28 October, 2021;
originally announced October 2021.
-
Online Hyperparameter Meta-Learning with Hypergradient Distillation
Authors:
Hae Beom Lee,
Hayeon Lee,
Jaewoong Shin,
Eunho Yang,
Timothy Hospedales,
Sung Ju Hwang
Abstract:
Many gradient-based meta-learning methods assume a set of parameters that do not participate in inner-optimization, which can be considered as hyperparameters. Although such hyperparameters can be optimized using the existing gradient-based hyperparameter optimization (HO) methods, they suffer from the following issues. Unrolled differentiation methods do not scale well to high-dimensional hyperpa…
▽ More
Many gradient-based meta-learning methods assume a set of parameters that do not participate in inner-optimization, which can be considered as hyperparameters. Although such hyperparameters can be optimized using the existing gradient-based hyperparameter optimization (HO) methods, they suffer from the following issues. Unrolled differentiation methods do not scale well to high-dimensional hyperparameters or horizon length, Implicit Function Theorem (IFT) based methods are restrictive for online optimization, and short horizon approximations suffer from short horizon bias. In this work, we propose a novel HO method that can overcome these limitations, by approximating the second-order term with knowledge distillation. Specifically, we parameterize a single Jacobian-vector product (JVP) for each HO step and minimize the distance from the true second-order term. Our method allows online optimization and also is scalable to the hyperparameter dimension and the horizon length. We demonstrate the effectiveness of our method on two different meta-learning methods and three benchmark datasets.
△ Less
Submitted 11 February, 2022; v1 submitted 6 October, 2021;
originally announced October 2021.
-
Cyclic Base Ordering of Graphs
Authors:
Jessica Li,
Eric Yang,
William Zhang
Abstract:
A cyclic base ordering of a connected graph $G$, is a cyclic ordering of $E(G)$ such that every cyclically consecutive $|V(G)|-1$ edges form a spanning tree. In this project, we study cyclic base ordering of various families of graphs, including square of cycles, wheel graphs, generalized wheel graphs and broken wheel graphs, fan and broken fan graphs, prism graphs, and maximal 2-degenerate graphs…
▽ More
A cyclic base ordering of a connected graph $G$, is a cyclic ordering of $E(G)$ such that every cyclically consecutive $|V(G)|-1$ edges form a spanning tree. In this project, we study cyclic base ordering of various families of graphs, including square of cycles, wheel graphs, generalized wheel graphs and broken wheel graphs, fan and broken fan graphs, prism graphs, and maximal 2-degenerate graphs. We also provide a polynomial time algorithm to verify any giving edge ordering is a cyclic base ordering.
△ Less
Submitted 2 October, 2021;
originally announced October 2021.
-
Distilling Linguistic Context for Language Model Compression
Authors:
Geondo Park,
Gyeongman Kim,
Eunho Yang
Abstract:
A computationally expensive and memory intensive neural network lies behind the recent success of language representation learning. Knowledge distillation, a major technique for deploying such a vast language model in resource-scarce environments, transfers the knowledge on individual word representations learned without restrictions. In this paper, inspired by the recent observations that languag…
▽ More
A computationally expensive and memory intensive neural network lies behind the recent success of language representation learning. Knowledge distillation, a major technique for deploying such a vast language model in resource-scarce environments, transfers the knowledge on individual word representations learned without restrictions. In this paper, inspired by the recent observations that language representations are relatively positioned and have more semantic knowledge as a whole, we present a new knowledge distillation objective for language representation learning that transfers the contextual knowledge via two types of relationships across representations: Word Relation and Layer Transforming Relation. Unlike other recent distillation techniques for the language models, our contextual distillation does not have any restrictions on architectural changes between teacher and student. We validate the effectiveness of our method on challenging benchmarks of language understanding tasks, not only in architectures of various sizes, but also in combination with DynaBERT, the recently proposed adaptive size pruning method.
△ Less
Submitted 17 September, 2021;
originally announced September 2021.
-
Domain Sparsification of Discrete Distributions using Entropic Independence
Authors:
Nima Anari,
Michał Dereziński,
Thuy-Duong Vuong,
Elizabeth Yang
Abstract:
We present a framework for speeding up the time it takes to sample from discrete distributions $μ$ defined over subsets of size $k$ of a ground set of $n$ elements, in the regime $k\ll n$. We show that having estimates of marginals $\mathbb{P}_{S\sim μ}[i\in S]$, the task of sampling from $μ$ can be reduced to sampling from distributions $ν$ supported on size $k$ subsets of a ground set of only…
▽ More
We present a framework for speeding up the time it takes to sample from discrete distributions $μ$ defined over subsets of size $k$ of a ground set of $n$ elements, in the regime $k\ll n$. We show that having estimates of marginals $\mathbb{P}_{S\sim μ}[i\in S]$, the task of sampling from $μ$ can be reduced to sampling from distributions $ν$ supported on size $k$ subsets of a ground set of only $n^{1-α}\cdot \operatorname{poly}(k)$ elements. Here, $1/α\in [1, k]$ is the parameter of entropic independence for $μ$. Further, the sparsified distributions $ν$ are obtained by applying a sparse (mostly $0$) external field to $μ$, an operation that often retains algorithmic tractability of sampling from $ν$. This phenomenon, which we dub domain sparsification, allows us to pay a one-time cost of estimating the marginals of $μ$, and in return reduce the amortized cost needed to produce many samples from the distribution $μ$, as is often needed in upstream tasks such as counting and inference.
For a wide range of distributions where $α=Ω(1)$, our result reduces the domain size, and as a corollary, the cost-per-sample, by a $\operatorname{poly}(n)$ factor. Examples include monomers in a monomer-dimer system, non-symmetric determinantal point processes, and partition-constrained Strongly Rayleigh measures. Our work significantly extends the reach of prior work of Anari and Dereziński who obtained domain sparsification for distributions with a log-concave generating polynomial (corresponding to $α=1$). As a corollary of our new analysis techniques, we also obtain a less stringent requirement on the accuracy of marginal estimates even for the case of log-concave polynomials; roughly speaking, we show that constant-factor approximation is enough for domain sparsification, improving over $O(1/k)$ relative error established in prior work.
△ Less
Submitted 14 September, 2021; v1 submitted 14 September, 2021;
originally announced September 2021.
-
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss
Authors:
Jung Hyun Lee,
Jihun Yun,
Sung Ju Hwang,
Eunho Yang
Abstract:
Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged for their deployments to resource-limited devices. Although recent studies have successfully discretized a full-precision network, they still incur large quantization errors after training, thus giving rise to a significant performance gap between a full-precision network and its quantize…
▽ More
Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged for their deployments to resource-limited devices. Although recent studies have successfully discretized a full-precision network, they still incur large quantization errors after training, thus giving rise to a significant performance gap between a full-precision network and its quantized counterpart. In this work, we propose a novel quantization method for neural networks, Cluster-Promoting Quantization (CPQ) that finds the optimal quantization grids while naturally encouraging the underlying full-precision weights to gather around those quantization grids cohesively during training. This property of CPQ is thanks to our two main ingredients that enable differentiable quantization: i) the use of the categorical distribution designed by a specific probabilistic parametrization in the forward pass and ii) our proposed multi-class straight-through estimator (STE) in the backward pass. Since our second component, multi-class STE, is intrinsically biased, we additionally propose a new bit-drop technique, DropBits, that revises the standard dropout regularization to randomly drop bits instead of neurons. As a natural extension of DropBits, we further introduce the way of learning heterogeneous quantization levels to find proper bit-length for each layer by imposing an additional regularization on DropBits. We experimentally validate our method on various benchmark datasets and network architectures, and also support a new hypothesis for quantization: learning heterogeneous quantization levels outperforms the case using the same but fixed quantization levels from scratch.
△ Less
Submitted 5 September, 2021;
originally announced September 2021.
-
TAR on Social Media: A Framework for Online Content Moderation
Authors:
Eugene Yang,
David D. Lewis,
Ophir Frieder
Abstract:
Content moderation (removing or limiting the distribution of posts based on their contents) is one tool social networks use to fight problems such as harassment and disinformation. Manually screening all content is usually impractical given the scale of social media data, and the need for nuanced human interpretations makes fully automated approaches infeasible. We consider content moderation from…
▽ More
Content moderation (removing or limiting the distribution of posts based on their contents) is one tool social networks use to fight problems such as harassment and disinformation. Manually screening all content is usually impractical given the scale of social media data, and the need for nuanced human interpretations makes fully automated approaches infeasible. We consider content moderation from the perspective of technology-assisted review (TAR): a human-in-the-loop active learning approach developed for high recall retrieval problems in civil litigation and other fields. We show how TAR workflows, and a TAR cost model, can be adapted to the content moderation problem. We then demonstrate on two publicly available content moderation data sets that a TAR workflow can reduce moderation costs by 20% to 55% across a variety of conditions.
△ Less
Submitted 29 August, 2021;
originally announced August 2021.
-
Certifying One-Phase Technology-Assisted Reviews
Authors:
David D. Lewis,
Eugene Yang,
Ophir Frieder
Abstract:
Technology-assisted review (TAR) workflows based on iterative active learning are widely used in document review applications. Most stop** rules for one-phase TAR workflows lack valid statistical guarantees, which has discouraged their use in some legal contexts. Drawing on the theory of quantile estimation, we provide the first broadly applicable and statistically valid sample-based stop** ru…
▽ More
Technology-assisted review (TAR) workflows based on iterative active learning are widely used in document review applications. Most stop** rules for one-phase TAR workflows lack valid statistical guarantees, which has discouraged their use in some legal contexts. Drawing on the theory of quantile estimation, we provide the first broadly applicable and statistically valid sample-based stop** rules for one-phase TAR. We further show theoretically and empirically that overshooting a recall target, which has been treated as innocuous or desirable in past evaluations of stop** rules, is a major source of excess cost in one-phase TAR workflows. Counterintuitively, incurring a larger sampling cost to reduce excess recall leads to lower total cost in almost all scenarios.
△ Less
Submitted 29 August, 2021;
originally announced August 2021.
-
The Role of Local Structure in the Enhanced Dynamics of Deformed Glasses
Authors:
Entao Yang,
Robert A. Riggleman
Abstract:
External stress can accelerate molecular mobility of amorphous solids by several orders of magnitude. The changes in mobility are commonly interpreted through the Eyring model, which invokes an empirical activation volume whose origin remains poorly understood. Here, we analyze constant-stress molecular dynamics simulations and propose an extension of the Eyring model with a machine-learned field,…
▽ More
External stress can accelerate molecular mobility of amorphous solids by several orders of magnitude. The changes in mobility are commonly interpreted through the Eyring model, which invokes an empirical activation volume whose origin remains poorly understood. Here, we analyze constant-stress molecular dynamics simulations and propose an extension of the Eyring model with a machine-learned field, softness. Our model connects the activation volume, an empirical parameter, to a structural property (softness). We show that stress has an inhomogeneous effect on the mobility that depends on local structure, which explains the narrower distribution of relaxation time observed under stress.
△ Less
Submitted 12 August, 2021;
originally announced August 2021.
-
Some results on the motivic nearby cycle
Authors:
Fangzhou **,
Enlin Yang
Abstract:
We extend Ayoub's formalism of motivic nearby cycle functor to the $\infty$-categorical level, and prove some desired cohomological properties by relating the motivic nearby cycle functor to the notion of local acyclicity in motivic homotopy.
We extend Ayoub's formalism of motivic nearby cycle functor to the $\infty$-categorical level, and prove some desired cohomological properties by relating the motivic nearby cycle functor to the notion of local acyclicity in motivic homotopy.
△ Less
Submitted 20 August, 2022; v1 submitted 18 July, 2021;
originally announced July 2021.
-
FedMix: Approximation of Mixup under Mean Augmented Federated Learning
Authors:
Tehrim Yoon,
Sumin Shin,
Sung Ju Hwang,
Eunho Yang
Abstract:
Federated learning (FL) allows edge devices to collectively learn a model without directly sharing data within each device, thus preserving privacy and eliminating the need to store data globally. While there are promising results under the assumption of independent and identically distributed (iid) local data, current state-of-the-art algorithms suffer from performance degradation as the heteroge…
▽ More
Federated learning (FL) allows edge devices to collectively learn a model without directly sharing data within each device, thus preserving privacy and eliminating the need to store data globally. While there are promising results under the assumption of independent and identically distributed (iid) local data, current state-of-the-art algorithms suffer from performance degradation as the heterogeneity of local data across clients increases. To resolve this issue, we propose a simple framework, Mean Augmented Federated Learning (MAFL), where clients send and receive averaged local data, subject to the privacy requirements of target applications. Under our framework, we propose a new augmentation algorithm, named FedMix, which is inspired by a phenomenal yet simple data augmentation method, Mixup, but does not require local raw data to be directly shared among devices. Our method shows greatly improved performance in the standard benchmark datasets of FL, under highly non-iid federated settings, compared to conventional algorithms.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Understanding and Improving Early Stop** for Learning with Noisy Labels
Authors:
Yingbin Bai,
Erkun Yang,
Bo Han,
Yanhua Yang,
Jiatong Li,
Yinian Mao,
Gang Niu,
Tongliang Liu
Abstract:
The memorization effect of deep neural network (DNN) plays a pivotal role in many state-of-the-art label-noise learning methods. To exploit this property, the early stop** trick, which stops the optimization at the early stage of training, is usually adopted. Current methods generally decide the early stop** point by considering a DNN as a whole. However, a DNN can be considered as a compositi…
▽ More
The memorization effect of deep neural network (DNN) plays a pivotal role in many state-of-the-art label-noise learning methods. To exploit this property, the early stop** trick, which stops the optimization at the early stage of training, is usually adopted. Current methods generally decide the early stop** point by considering a DNN as a whole. However, a DNN can be considered as a composition of a series of layers, and we find that the latter layers in a DNN are much more sensitive to label noise, while their former counterparts are quite robust. Therefore, selecting a stop** point for the whole network may make different DNN layers antagonistically affected each other, thus degrading the final performance. In this paper, we propose to separate a DNN into different parts and progressively train them to address this problem. Instead of the early stop**, which trains a whole DNN all at once, we initially train former DNN layers by optimizing the DNN with a relatively large number of epochs. During training, we progressively train the latter DNN layers by using a smaller number of epochs with the preceding layers fixed to counteract the impact of noisy labels. We term the proposed method as progressive early stop** (PES). Despite its simplicity, compared with the early stop**, PES can help to obtain more promising and stable results. Furthermore, by combining PES with existing approaches on noisy label training, we achieve state-of-the-art performance on image classification benchmarks.
△ Less
Submitted 26 December, 2021; v1 submitted 30 June, 2021;
originally announced June 2021.
-
Heuristic Stop** Rules For Technology-Assisted Review
Authors:
Eugene Yang,
David D. Lewis,
Ophir Frieder
Abstract:
Technology-assisted review (TAR) refers to human-in-the-loop active learning workflows for finding relevant documents in large collections. These workflows often must meet a target for the proportion of relevant documents found (i.e. recall) while also holding down costs. A variety of heuristic stop** rules have been suggested for striking this tradeoff in particular settings, but none have been…
▽ More
Technology-assisted review (TAR) refers to human-in-the-loop active learning workflows for finding relevant documents in large collections. These workflows often must meet a target for the proportion of relevant documents found (i.e. recall) while also holding down costs. A variety of heuristic stop** rules have been suggested for striking this tradeoff in particular settings, but none have been tested against a range of recall targets and tasks. We propose two new heuristic stop** rules, Quant and QuantCI based on model-based estimation techniques from survey research. We compare them against a range of proposed heuristics and find they are accurate at hitting a range of recall targets while substantially reducing review costs.
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
On Minimizing Cost in Legal Document Review Workflows
Authors:
Eugene Yang,
David D. Lewis,
Ophir Frieder
Abstract:
Technology-assisted review (TAR) refers to human-in-the-loop machine learning workflows for document review in legal discovery and other high recall review tasks. Attorneys and legal technologists have debated whether review should be a single iterative process (one-phase TAR workflows) or whether model training and review should be separate (two-phase TAR workflows), with implications for the cho…
▽ More
Technology-assisted review (TAR) refers to human-in-the-loop machine learning workflows for document review in legal discovery and other high recall review tasks. Attorneys and legal technologists have debated whether review should be a single iterative process (one-phase TAR workflows) or whether model training and review should be separate (two-phase TAR workflows), with implications for the choice of active learning algorithm. The relative cost of manual labeling for different purposes (training vs. review) and of different documents (positive vs. negative examples) is a key and neglected factor in this debate. Using a novel cost dynamics analysis, we show analytically and empirically that these relative costs strongly impact whether a one-phase or two-phase workflow minimizes cost. We also show how category prevalence, classification task difficulty, and collection size impact the optimal choice not only of workflow type, but of active learning method and stop** point.
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Authors:
Dongchan Min,
Dong Bok Lee,
Eunho Yang,
Sung Ju Hwang
Abstract:
With rapid progress in neural text-to-speech (TTS) models, personalized speech generation is now in high demand for many applications. For practical applicability, a TTS model should generate high-quality speech with only a few audio samples from the given speaker, that are also short in length. However, existing methods either require to fine-tune the model or achieve low adaptation quality witho…
▽ More
With rapid progress in neural text-to-speech (TTS) models, personalized speech generation is now in high demand for many applications. For practical applicability, a TTS model should generate high-quality speech with only a few audio samples from the given speaker, that are also short in length. However, existing methods either require to fine-tune the model or achieve low adaptation quality without fine-tuning. In this work, we propose StyleSpeech, a new TTS model which not only synthesizes high-quality speech but also effectively adapts to new speakers. Specifically, we propose Style-Adaptive Layer Normalization (SALN) which aligns gain and bias of the text input according to the style extracted from a reference speech audio. With SALN, our model effectively synthesizes speech in the style of the target speaker even from single speech audio. Furthermore, to enhance StyleSpeech's adaptation to speech from new speakers, we extend it to Meta-StyleSpeech by introducing two discriminators trained with style prototypes, and performing episodic training. The experimental results show that our models generate high-quality speech which accurately follows the speaker's voice with single short-duration (1-3 sec) speech audio, significantly outperforming baselines.
△ Less
Submitted 16 June, 2021; v1 submitted 6 June, 2021;
originally announced June 2021.
-
Online Coreset Selection for Rehearsal-based Continual Learning
Authors:
Jaehong Yoon,
Divyam Madaan,
Eunho Yang,
Sung Ju Hwang
Abstract:
A dataset is a shred of crucial evidence to describe a task. However, each data point in the dataset does not have the same potential, as some of the data points can be more representative or informative than others. This unequal importance among the data points may have a large impact in rehearsal-based continual learning, where we store a subset of the training examples (coreset) to be replayed…
▽ More
A dataset is a shred of crucial evidence to describe a task. However, each data point in the dataset does not have the same potential, as some of the data points can be more representative or informative than others. This unequal importance among the data points may have a large impact in rehearsal-based continual learning, where we store a subset of the training examples (coreset) to be replayed later to alleviate catastrophic forgetting. In continual learning, the quality of the samples stored in the coreset directly affects the model's effectiveness and efficiency. The coreset selection problem becomes even more important under realistic settings, such as imbalanced continual learning or noisy data scenarios. To tackle this problem, we propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration and trains them in an online manner. Our proposed method maximizes the model's adaptation to a current dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting. We validate the effectiveness of our coreset selection mechanism over various standard, imbalanced, and noisy datasets against strong continual learning baselines, demonstrating that it improves task adaptation and prevents catastrophic forgetting in a sample-efficient manner.
△ Less
Submitted 18 March, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
Estimating Instance-dependent Bayes-label Transition Matrix using a Deep Neural Network
Authors:
Shuo Yang,
Erkun Yang,
Bo Han,
Yang Liu,
Min Xu,
Gang Niu,
Tongliang Liu
Abstract:
In label-noise learning, estimating the transition matrix is a hot topic as the matrix plays an important role in building statistically consistent classifiers. Traditionally, the transition from clean labels to noisy labels (i.e., clean-label transition matrix (CLTM)) has been widely exploited to learn a clean label classifier by employing the noisy data. Motivated by that classifiers mostly outp…
▽ More
In label-noise learning, estimating the transition matrix is a hot topic as the matrix plays an important role in building statistically consistent classifiers. Traditionally, the transition from clean labels to noisy labels (i.e., clean-label transition matrix (CLTM)) has been widely exploited to learn a clean label classifier by employing the noisy data. Motivated by that classifiers mostly output Bayes optimal labels for prediction, in this paper, we study to directly model the transition from Bayes optimal labels to noisy labels (i.e., Bayes-label transition matrix (BLTM)) and learn a classifier to predict Bayes optimal labels. Note that given only noisy data, it is ill-posed to estimate either the CLTM or the BLTM. But favorably, Bayes optimal labels have less uncertainty compared with the clean labels, i.e., the class posteriors of Bayes optimal labels are one-hot vectors while those of clean labels are not. This enables two advantages to estimate the BLTM, i.e., (a) a set of examples with theoretically guaranteed Bayes optimal labels can be collected out of noisy data; (b) the feasible solution space is much smaller. By exploiting the advantages, we estimate the BLTM parametrically by employing a deep neural network, leading to better generalization and superior classification performance.
△ Less
Submitted 14 July, 2022; v1 submitted 27 May, 2021;
originally announced May 2021.
-
Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review
Authors:
Eugene Yang,
Sean MacAvaney,
David D. Lewis,
Ophir Frieder
Abstract:
Technology-assisted review (TAR) refers to iterative active learning workflows for document review in high recall retrieval (HRR) tasks. TAR research and most commercial TAR software have applied linear models such as logistic regression to lexical features. Transformer-based models with supervised tuning are known to improve effectiveness on many text classification tasks, suggesting their use in…
▽ More
Technology-assisted review (TAR) refers to iterative active learning workflows for document review in high recall retrieval (HRR) tasks. TAR research and most commercial TAR software have applied linear models such as logistic regression to lexical features. Transformer-based models with supervised tuning are known to improve effectiveness on many text classification tasks, suggesting their use in TAR. We indeed find that the pre-trained BERT model reduces review cost by 10% to 15% in TAR workflows simulated on the RCV1-v2 newswire collection. In contrast, we likewise determined that linear models outperform BERT for simulated legal discovery topics on the Jeb Bush e-mail collection. This suggests the match between transformer pre-training corpora and the task domain is of greater significance than generally appreciated. Additionally, we show that just-right language model fine-tuning on the task collection before starting active learning is critical. Too little or too much fine-tuning hinders performance, worse than that of linear models, even for a favorable corpus such as RCV1-v2.
△ Less
Submitted 19 January, 2022; v1 submitted 3 May, 2021;
originally announced May 2021.
-
RetCL: A Selection-based Approach for Retrosynthesis via Contrastive Learning
Authors:
Hankook Lee,
Sungsoo Ahn,
Seung-Woo Seo,
You Young Song,
Eunho Yang,
Sung-Ju Hwang,
**woo Shin
Abstract:
Retrosynthesis, of which the goal is to find a set of reactants for synthesizing a target product, is an emerging research area of deep learning. While the existing approaches have shown promising results, they currently lack the ability to consider availability (e.g., stability or purchasability) of the reactants or generalize to unseen reaction templates (i.e., chemical reaction rules). In this…
▽ More
Retrosynthesis, of which the goal is to find a set of reactants for synthesizing a target product, is an emerging research area of deep learning. While the existing approaches have shown promising results, they currently lack the ability to consider availability (e.g., stability or purchasability) of the reactants or generalize to unseen reaction templates (i.e., chemical reaction rules). In this paper, we propose a new approach that mitigates the issues by reformulating retrosynthesis into a selection problem of reactants from a candidate set of commercially available molecules. To this end, we design an efficient reactant selection framework, named RetCL (retrosynthesis via contrastive learning), for enumerating all of the candidate molecules based on selection scores computed by graph neural networks. For learning the score functions, we also propose a novel contrastive training scheme with hard negative mining. Extensive experiments demonstrate the benefits of the proposed selection-based approach. For example, when all 671k reactants in the USPTO {database} are given as candidates, our RetCL achieves top-1 exact match accuracy of $71.3\%$ for the USPTO-50k benchmark, while a recent transformer-based approach achieves $59.6\%$. We also demonstrate that RetCL generalizes well to unseen templates in various settings in contrast to template-based approaches.
△ Less
Submitted 3 June, 2021; v1 submitted 3 May, 2021;
originally announced May 2021.
-
High Performance Convolution Using Sparsity and Patterns for Inference in Deep Convolutional Neural Networks
Authors:
Hossam Amer,
Ahmed H. Salamah,
Ahmad Sajedi,
En-hui Yang
Abstract:
Deploying deep Convolutional Neural Networks (CNNs) is impacted by their memory footprint and speed requirements, which mainly come from convolution. Widely-used convolution algorithms, im2col and MEC, produce a lowered matrix from an activation map by redundantly storing the map's elements included at horizontal and/or vertical kernel overlap**s without considering the sparsity of the map. Usin…
▽ More
Deploying deep Convolutional Neural Networks (CNNs) is impacted by their memory footprint and speed requirements, which mainly come from convolution. Widely-used convolution algorithms, im2col and MEC, produce a lowered matrix from an activation map by redundantly storing the map's elements included at horizontal and/or vertical kernel overlap**s without considering the sparsity of the map. Using the sparsity of the map, this paper proposes two new convolution algorithms dubbed Compressed Pattern Overlap (CPO) and Compressed Pattern Sets (CPS) that simultaneously decrease the memory footprint and increase the inference speed while preserving the accuracy. CPO recognizes non-zero elements (NZEs) at horizontal and vertical overlap**s in the activation maps. CPS further improves the memory savings of CPO by compressing the index positions of neighboring NZEs. In both algorithms, channels/regions of the activation maps with all zeros are skipped. Then, CPO/CPS performs convolution via Sparse Matrix-Vector Multiplication (SpMv) done on their sparse representations. Experimental results conducted on CPUs show that average per-layer time savings reach up to 63% and Compression Ratio (CR) up to 26x with respect to im2col. In some layers, our average per layer CPO/CPS time savings are better by 28% and CR is better by 9.2x than the parallel implementation of MEC. For a given CNN's inference, we offline select for each convolution layer the best convolutional algorithm in terms of time between either CPO or CPS and im2col. Our algorithms were selected up to 56% of the non-pointwise convolutional layers. Our offline selections yield CNN inference time savings up to 9% and CR up to 10x.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
Mutually-Constrained Monotonic Multihead Attention for Online ASR
Authors:
Jaeyun Song,
Ha** Shim,
Eunho Yang
Abstract:
Despite the feature of real-time decoding, Monotonic Multihead Attention (MMA) shows comparable performance to the state-of-the-art offline methods in machine translation and automatic speech recognition (ASR) tasks. However, the latency of MMA is still a major issue in ASR and should be combined with a technique that can reduce the test latency at inference time, such as head-synchronous beam sea…
▽ More
Despite the feature of real-time decoding, Monotonic Multihead Attention (MMA) shows comparable performance to the state-of-the-art offline methods in machine translation and automatic speech recognition (ASR) tasks. However, the latency of MMA is still a major issue in ASR and should be combined with a technique that can reduce the test latency at inference time, such as head-synchronous beam search decoding, which forces all non-activated heads to activate after a small fixed delay from the first head activation. In this paper, we remove the discrepancy between training and test phases by considering, in the training of MMA, the interactions across multiple heads that will occur in the test time. Specifically, we derive the expected alignments from monotonic attention by considering the boundaries of other heads and reflect them in the learning process. We validate our proposed method on the two standard benchmark datasets for ASR and show that our approach, MMA with the mutually-constrained heads from the training stage, provides better performance than baselines.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
Learning Polar Encodings for Arbitrary-Oriented Ship Detection in SAR Images
Authors:
Yishan He,
Fei Gao,
Jun Wang,
Amir Hussain,
Erfu Yang,
Huiyu Zhou
Abstract:
Common horizontal bounding box (HBB)-based methods are not capable of accurately locating slender ship targets with arbitrary orientations in synthetic aperture radar (SAR) images. Therefore, in recent years, methods based on oriented bounding box (OBB) have gradually received attention from researchers. However, most of the recently proposed deep learning-based methods for OBB detection encounter…
▽ More
Common horizontal bounding box (HBB)-based methods are not capable of accurately locating slender ship targets with arbitrary orientations in synthetic aperture radar (SAR) images. Therefore, in recent years, methods based on oriented bounding box (OBB) have gradually received attention from researchers. However, most of the recently proposed deep learning-based methods for OBB detection encounter the boundary discontinuity problem in angle or key point regression. In order to alleviate this problem, researchers propose to introduce some manually set parameters or extra network branches for distinguishing the boundary cases, which make training more diffcult and lead to performance degradation. In this paper, in order to solve the boundary discontinuity problem in OBB regression, we propose to detect SAR ships by learning polar encodings. The encoding scheme uses a group of vectors pointing from the center of the ship target to the boundary points to represent an OBB. The boundary discontinuity problem is avoided by training and inference directly according to the polar encodings. In addition, we propose an Intersect over Union (IOU) -weighted regression loss, which further guides the training of polar encodings through the IOU metric and improves the detection performance. Experiments on the Rotating SAR Ship Detection Dataset (RSSDD) show that the proposed method can achieve better detection performance over other comparison algorithms and other OBB encoding schemes, demonstrating the effectiveness of our method.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
ToxCCIn: Toxic Content Classification with Interpretability
Authors:
Tong Xiang,
Sean MacAvaney,
Eugene Yang,
Nazli Goharian
Abstract:
Despite the recent successes of transformer-based models in terms of effectiveness on a variety of tasks, their decisions often remain opaque to humans. Explanations are particularly important for tasks like offensive language or toxicity detection on social media because a manual appeal process is often in place to dispute automatically flagged content. In this work, we propose a technique to imp…
▽ More
Despite the recent successes of transformer-based models in terms of effectiveness on a variety of tasks, their decisions often remain opaque to humans. Explanations are particularly important for tasks like offensive language or toxicity detection on social media because a manual appeal process is often in place to dispute automatically flagged content. In this work, we propose a technique to improve the interpretability of these models, based on a simple and powerful assumption: a post is at least as toxic as its most toxic span. We incorporate this assumption into transformer models by scoring a post based on the maximum toxicity of its spans and augmenting the training process to identify correct spans. We find this approach effective and can produce explanations that exceed the quality of those provided by Logistic Regression analysis (often regarded as a highly-interpretable model), according to a human study.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
Model-Augmented Q-learning
Authors:
Youngmin Oh,
**woo Shin,
Eunho Yang,
Sung Ju Hwang
Abstract:
In recent years, $Q$-learning has become indispensable for model-free reinforcement learning (MFRL). However, it suffers from well-known problems such as under- and overestimation bias of the value, which may adversely affect the policy learning. To resolve this issue, we propose a MFRL framework that is augmented with the components of model-based RL. Specifically, we propose to estimate not only…
▽ More
In recent years, $Q$-learning has become indispensable for model-free reinforcement learning (MFRL). However, it suffers from well-known problems such as under- and overestimation bias of the value, which may adversely affect the policy learning. To resolve this issue, we propose a MFRL framework that is augmented with the components of model-based RL. Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network. We further utilize the estimated reward from the model estimators for $Q$-learning, which promotes interaction between the estimators. We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward. Finally, we also provide a trick to prioritize past experiences in the replay buffer by utilizing model-estimation errors. We experimentally validate MQL built upon state-of-the-art off-policy MFRL methods, and show that MQL largely improves their performance and convergence. The proposed scheme is simple to implement and does not require additional training cost.
△ Less
Submitted 7 February, 2021;
originally announced February 2021.
-
Analysis of Temperature-to-Polarization Leakage in BICEP3 and Keck CMB Data from 2016 to 2018
Authors:
The BICEP/Keck Collaboration,
:,
T. St. Germaine,
P. A. R. Ade,
Z. Ahmed,
M. Amiri,
D. Barkats,
R. Basu Thakur,
C. A. Bischoff,
J. J. Bock,
H. Boenish,
E. Bullock,
V. Buza,
J. R. Cheshire,
J. Connors,
J. Cornelison,
M. Crumrine,
A. Cukierman,
E. Denison,
M. Dierickx,
L. Duband,
M. Eiben,
S. Fatigoni,
J. P. Filippini,
S. Fliescher
, et al. (64 additional authors not shown)
Abstract:
The BICEP/Keck Array experiment is a series of small-aperture refracting telescopes observing degree-scale Cosmic Microwave Background polarization from the South Pole in search of a primordial $B$-mode signature. As a pair differencing experiment, an important systematic that must be controlled is the differential beam response between the co-located, orthogonally polarized detectors. We use high…
▽ More
The BICEP/Keck Array experiment is a series of small-aperture refracting telescopes observing degree-scale Cosmic Microwave Background polarization from the South Pole in search of a primordial $B$-mode signature. As a pair differencing experiment, an important systematic that must be controlled is the differential beam response between the co-located, orthogonally polarized detectors. We use high-fidelity, in-situ measurements of the beam response to estimate the temperature-to-polarization (T $\rightarrow$ P) leakage in our latest data including observations from 2016 through 2018. This includes three years of BICEP3 observing at 95 GHz, and multifrequency data from Keck Array. Here we present band-averaged far-field beam maps, differential beam mismatch, and residual beam power (after filtering out the leading difference modes via deprojection) for these receivers. We show preliminary results of "beam map simulations," which use these beam maps to observe a simulated temperature (no $Q/U$) sky to estimate T $\rightarrow$ P leakage in our real data.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Few-Shot Domain Adaptation for Grammatical Error Correction via Meta-Learning
Authors:
Shengsheng Zhang,
Ya** Huang,
Yun Chen,
Liner Yang,
Chencheng Wang,
Erhong Yang
Abstract:
Most existing Grammatical Error Correction (GEC) methods based on sequence-to-sequence mainly focus on how to generate more pseudo data to obtain better performance. Few work addresses few-shot GEC domain adaptation. In this paper, we treat different GEC domains as different GEC tasks and propose to extend meta-learning to few-shot GEC domain adaptation without using any pseudo data. We exploit a…
▽ More
Most existing Grammatical Error Correction (GEC) methods based on sequence-to-sequence mainly focus on how to generate more pseudo data to obtain better performance. Few work addresses few-shot GEC domain adaptation. In this paper, we treat different GEC domains as different GEC tasks and propose to extend meta-learning to few-shot GEC domain adaptation without using any pseudo data. We exploit a set of data-rich source domains to learn the initialization of model parameters that facilitates fast adaptation on new resource-poor target domains. We adapt GEC model to the first language (L1) of the second language learner. To evaluate the proposed method, we use nine L1s as source domains and five L1s as target domains. Experiment results on the L1 GEC domain adaptation dataset demonstrate that the proposed approach outperforms the multi-task transfer learning baseline by 0.50 $F_{0.5}$ score on average and enables us to effectively adapt to a new L1 domain with only 200 parallel sentences.
△ Less
Submitted 29 January, 2021;
originally announced January 2021.
-
Porting WarpX to GPU-accelerated platforms
Authors:
A. Myers,
A. Almgren,
L. D. Amorim,
J. Bell,
L. Fedeli,
L. Ge,
K. Gott,
D. P. Grote,
M. Hogan,
A. Huebl,
R. Jambunathan,
R. Lehe,
C. Ng,
M. Rowan,
O. Shapoval,
M. Thévenet,
J. -L. Vay,
H. Vincenti,
E. Yang,
N. Zaïm,
W. Zhang,
Y. Zhao,
E. Zoni
Abstract:
WarpX is a general purpose electromagnetic particle-in-cell code that was originally designed to run on many-core CPU architectures. We describe the strategy followed to allow WarpX to use the GPU-accelerated nodes on OLCF's Summit supercomputer, a strategy we believe will extend to the upcoming machines Frontier and Aurora. We summarize the challenges encountered, lessons learned, and give curren…
▽ More
WarpX is a general purpose electromagnetic particle-in-cell code that was originally designed to run on many-core CPU architectures. We describe the strategy followed to allow WarpX to use the GPU-accelerated nodes on OLCF's Summit supercomputer, a strategy we believe will extend to the upcoming machines Frontier and Aurora. We summarize the challenges encountered, lessons learned, and give current performance results on a series of relevant benchmark problems.
△ Less
Submitted 2 September, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
Censorship of Online Encyclopedias: Implications for NLP Models
Authors:
Eddie Yang,
Margaret E. Roberts
Abstract:
While artificial intelligence provides the backbone for many tools people use around the world, recent work has brought to attention that the algorithms powering AI are not free of politics, stereotypes, and bias. While most work in this area has focused on the ways in which AI can exacerbate existing inequalities and discrimination, very little work has studied how governments actively shape trai…
▽ More
While artificial intelligence provides the backbone for many tools people use around the world, recent work has brought to attention that the algorithms powering AI are not free of politics, stereotypes, and bias. While most work in this area has focused on the ways in which AI can exacerbate existing inequalities and discrimination, very little work has studied how governments actively shape training data. We describe how censorship has affected the development of Wikipedia corpuses, text data which are regularly used for pre-trained inputs into NLP algorithms. We show that word embeddings trained on Baidu Baike, an online Chinese encyclopedia, have very different associations between adjectives and a range of concepts about democracy, freedom, collective action, equality, and people and historical events in China than its regularly blocked but uncensored counterpart - Chinese language Wikipedia. We examine the implications of these discrepancies by studying their use in downstream AI applications. Our paper shows how government repression, censorship, and self-censorship may impact training data and the applications that draw from them.
△ Less
Submitted 22 January, 2021;
originally announced January 2021.