Search | arXiv e-print repository

arXiv:2406.19435 [pdf, other]

A Sanity Check for AI-generated Image Detection

Authors: Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, Weidi Xie

Abstract: With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been solved". To start with, we present Chameleon dataset, consisting AIgenerated images that are genuinely challenging for human perception. To quantify th… ▽ More With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been solved". To start with, we present Chameleon dataset, consisting AIgenerated images that are genuinely challenging for human perception. To quantify the generalization of existing methods, we evaluate 9 off-the-shelf AI-generated image detectors on Chameleon dataset. Upon analysis, almost all models classify AI-generated images as real ones. Later, we propose AIDE (AI-generated Image DEtector with Hybrid Features), which leverages multiple experts to simultaneously extract visual artifacts and noise patterns. Specifically, to capture the high-level semantics, we utilize CLIP to compute the visual embedding. This effectively enables the model to discern AI-generated images based on semantics or contextual information; Secondly, we select the highest frequency patches and the lowest frequency patches in the image, and compute the low-level patchwise features, aiming to detect AI-generated images by low-level artifacts, for example, noise pattern, anti-aliasing, etc. While evaluating on existing benchmarks, for example, AIGCDetectBenchmark and GenImage, AIDE achieves +3.5% and +4.6% improvements to state-of-the-art methods, and on our proposed challenging Chameleon benchmarks, it also achieves the promising results, despite this problem for detecting AI-generated images is far from being solved. The dataset, codes, and pre-train models will be published at https://github.com/shilinyan99/AIDE. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: Project page: https://shilinyan99.github.io/AIDE Code: https://github.com/shilinyan99/AIDE

arXiv:2406.17660 [pdf, other]

Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients

Authors: Aashiq Muhamed, Oscar Li, David Woodruff, Mona Diab, Virginia Smith

Abstract: Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory. While existing projection-based optimization methods address this by projecting gradients into a lower-dimensional subspace to reduce optimizer state memory, they typically rely on dense projection matrices, which can introduce computational and memory overheads. In this work, we propose Grass (GRAdien… ▽ More Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory. While existing projection-based optimization methods address this by projecting gradients into a lower-dimensional subspace to reduce optimizer state memory, they typically rely on dense projection matrices, which can introduce computational and memory overheads. In this work, we propose Grass (GRAdient Stuctured Sparsification), a novel approach that leverages sparse projections to transform gradients into structured sparse updates. This design not only significantly reduces memory usage for optimizer states but also minimizes gradient memory footprint, computation, and communication costs, leading to substantial throughput improvements. Extensive experiments on pretraining and finetuning tasks demonstrate that Grass achieves competitive performance to full-rank training and existing projection-based methods. Notably, Grass enables half-precision pretraining of a 13B parameter LLaMA model on a single 40GB A100 GPU--a feat infeasible for previous methods--and yields up to a $2\times$ throughput improvement on an 8-GPU system. Code can be found at https://github.com/aashiqmuhamed/GRASS . △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2402.14547 [pdf, other]

OmniPred: Language Models as Universal Regressors

Authors: Xingyou Song, Oscar Li, Chansoo Lee, Bangding Yang, Daiyi Peng, Sagi Perel, Yutian Chen

Abstract: Over the broad landscape of experimental design, regression has been a powerful tool to accurately predict the outcome metrics of a system or model given a set of parameters, but has been traditionally restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over $(x,y)$ evalu… ▽ More Over the broad landscape of experimental design, regression has been a powerful tool to accurately predict the outcome metrics of a system or model given a set of parameters, but has been traditionally restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over $(x,y)$ evaluation data from diverse real world experiments. Using data sourced from Google Vizier, one of the largest blackbox optimization databases in the world, our extensive experiments demonstrate that through only textual representations of mathematical parameters and values, language models are capable of very precise numerical regression, and if given the opportunity to train over multiple tasks, can significantly outperform traditional regression models. △ Less

Submitted 4 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 24 pages, 10 figures. Code can be found in https://github.com/google-research/optformer/tree/main/optformer/omnipred

arXiv:2310.14563 [pdf, other]

NormDial: A Comparable Bilingual Synthetic Dialog Dataset for Modeling Social Norm Adherence and Violation

Authors: Oliver Li, Mallika Subramanian, Arkadiy Saakyan, Sky CH-Wang, Smaranda Muresan

Abstract: Social norms fundamentally shape interpersonal communication. We present NormDial, a high-quality dyadic dialogue dataset with turn-by-turn annotations of social norm adherences and violations for Chinese and American cultures. Introducing the task of social norm observance detection, our dataset is synthetically generated in both Chinese and English using a human-in-the-loop pipeline by prompting… ▽ More Social norms fundamentally shape interpersonal communication. We present NormDial, a high-quality dyadic dialogue dataset with turn-by-turn annotations of social norm adherences and violations for Chinese and American cultures. Introducing the task of social norm observance detection, our dataset is synthetically generated in both Chinese and English using a human-in-the-loop pipeline by prompting large language models with a small collection of expert-annotated social norms. We show that our generated dialogues are of high quality through human evaluation and further evaluate the performance of existing large language models on this task. Our findings point towards new directions for understanding the nuances of social norms as they manifest in conversational contexts that span across languages and cultures. △ Less

Submitted 24 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 Main Conference, Short Paper; Data at https://github.com/Aochong-Li/NormDial

arXiv:2305.14492 [pdf, other]

Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment

Authors: Sky CH-Wang, Arkadiy Saakyan, Oliver Li, Zhou Yu, Smaranda Muresan

Abstract: Designing systems that can reason across cultures requires that they are grounded in the norms of the contexts in which they operate. However, current research on develo** computational models of social norms has primarily focused on American society. Here, we propose a novel approach to discover and compare descriptive social norms across Chinese and American cultures. We demonstrate our approa… ▽ More Designing systems that can reason across cultures requires that they are grounded in the norms of the contexts in which they operate. However, current research on develo** computational models of social norms has primarily focused on American society. Here, we propose a novel approach to discover and compare descriptive social norms across Chinese and American cultures. We demonstrate our approach by leveraging discussions on a Chinese Q&A platform (Zhihu) and the existing SocialChemistry dataset as proxies for contrasting cultural axes, align social situations cross-culturally, and extract social norms from texts using in-context learning. Embedding Chain-of-Thought prompting in a human-AI collaborative framework, we build a high-quality dataset of 3,069 social norms aligned with social situations across Chinese and American cultures alongside corresponding free-text explanations. To test the ability of models to reason about social norms across cultures, we introduce the task of explainable social norm entailment, showing that existing models under 3B parameters have significant room for improvement in both automatic and human evaluation. Further analysis of cross-cultural norm differences based on our dataset shows empirical alignment with the social orientations framework, revealing several situational and descriptive nuances in norms across these cultures. △ Less

Submitted 22 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: EMNLP 2023 Main Conference (Long Paper)

arXiv:2304.12180 [pdf, other]

Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies

Authors: Oscar Li, James Harrison, Jascha Sohl-Dickstein, Virginia Smith, Luke Metz

Abstract: Unrolled computation graphs are prevalent throughout machine learning but present challenges to automatic differentiation (AD) gradient estimation methods when their loss functions exhibit extreme local sensitivtiy, discontinuity, or blackbox characteristics. In such scenarios, online evolution strategies methods are a more capable alternative, while being more parallelizable than vanilla evolutio… ▽ More Unrolled computation graphs are prevalent throughout machine learning but present challenges to automatic differentiation (AD) gradient estimation methods when their loss functions exhibit extreme local sensitivtiy, discontinuity, or blackbox characteristics. In such scenarios, online evolution strategies methods are a more capable alternative, while being more parallelizable than vanilla evolution strategies (ES) by interleaving partial unrolls and gradient updates. In this work, we propose a general class of unbiased online evolution strategies methods. We analytically and empirically characterize the variance of this class of gradient estimators and identify the one with the least variance, which we term Noise-Reuse Evolution Strategies (NRES). Experimentally, we show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of unroll steps across a variety of applications, including learning dynamical systems, meta-training learned optimizers, and reinforcement learning. △ Less

Submitted 9 December, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

Comments: NeurIPS 2023. 41 pages. Code available at https://github.com/OscarcarLi/Noise-Reuse-Evolution-Strategies

arXiv:2304.11938 [pdf, other]

Is ChatGPT the Ultimate Programming Assistant -- How far is it?

Authors: Haoye Tian, Weiqi Lu, Tsz On Li, Xunzhu Tang, Shing-Chi Cheung, Jacques Klein, Tegawendé F. Bissyandé

Abstract: Recently, the ChatGPT LLM has received great attention: it can be used as a bot for discussing source code, prompting it to suggest changes, provide descriptions or even generate code. Typical demonstrations generally focus on existing benchmarks, which may have been used in model training (i.e., data leakage). To assess the feasibility of using an LLM as a useful assistant bot for programmers, we… ▽ More Recently, the ChatGPT LLM has received great attention: it can be used as a bot for discussing source code, prompting it to suggest changes, provide descriptions or even generate code. Typical demonstrations generally focus on existing benchmarks, which may have been used in model training (i.e., data leakage). To assess the feasibility of using an LLM as a useful assistant bot for programmers, we must assess its realistic capabilities on unseen problems as well as its capabilities on various tasks. In this paper, we present an empirical study of ChatGPT's potential as a fully automated programming assistant, focusing on the tasks of code generation, program repair, and code summariziation. The study investigates ChatGPT's performance on common programming problems and compares it with state-of-the-art approaches on two benchmarks. Among several findings, our study shows that ChatGPT is effective in dealing with common programming problems. However, our experiments also reveal limitations in terms of its attention span: detailed descriptions will constrain the focus of ChatGPT and prevent it from leveraging its vast knowledge to solve the actual problem. Surprisingly, we have identified the ability of ChatGPT to reason the original intention of the code. We expect future work to build on this insight for dealing with the open question of the oracle problem. Our findings contribute interesting insights to the development of LLMs for programming assistance, notably by demonstrating the importance of prompt engineering, and providing a better understanding of ChatGPT's practical applications for software engineering. △ Less

Submitted 31 August, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

arXiv:2303.08698 [pdf, other]

Bi-directional Distribution Alignment for Transductive Zero-Shot Learning

Authors: Zhicai Wang, Yanbin Hao, Tingting Mu, Ouxiang Li, Shuo Wang, Xiangnan He

Abstract: It is well-known that zero-shot learning (ZSL) can suffer severely from the problem of domain shift, where the true and learned data distributions for the unseen classes do not match. Although transductive ZSL (TZSL) attempts to improve this by allowing the use of unlabelled examples from the unseen classes, there is still a high level of distribution shift. We propose a novel TZSL model (named as… ▽ More It is well-known that zero-shot learning (ZSL) can suffer severely from the problem of domain shift, where the true and learned data distributions for the unseen classes do not match. Although transductive ZSL (TZSL) attempts to improve this by allowing the use of unlabelled examples from the unseen classes, there is still a high level of distribution shift. We propose a novel TZSL model (named as Bi-VAEGAN), which largely improves the shift by a strengthened distribution alignment between the visual and auxiliary spaces. The key proposal of the model design includes (1) a bi-directional distribution alignment, (2) a simple but effective L_2-norm based feature normalization approach, and (3) a more sophisticated unseen class prior estimation approach. In benchmark evaluation using four datasets, Bi-VAEGAN achieves the new state of the arts under both the standard and generalized TZSL settings. Code could be found at https://github.com/Zhicaiwww/Bi-VAEGAN △ Less

Submitted 19 March, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: CVPR2023

arXiv:2210.09396 [pdf, other]

Affective Idiosyncratic Responses to Music

Authors: Sky CH-Wang, Evan Li, Oliver Li, Smaranda Muresan, Zhou Yu

Abstract: Affective responses to music are highly personal. Despite consensus that idiosyncratic factors play a key role in regulating how listeners emotionally respond to music, precisely measuring the marginal effects of these variables has proved challenging. To address this gap, we develop computational methods to measure affective responses to music from over 403M listener comments on a Chinese social… ▽ More Affective responses to music are highly personal. Despite consensus that idiosyncratic factors play a key role in regulating how listeners emotionally respond to music, precisely measuring the marginal effects of these variables has proved challenging. To address this gap, we develop computational methods to measure affective responses to music from over 403M listener comments on a Chinese social music platform. Building on studies from music psychology in systematic and quasi-causal analyses, we test for musical, lyrical, contextual, demographic, and mental health effects that drive listener affective responses. Finally, motivated by the social phenomenon known as wǎng-yì-yún, we identify influencing factors of platform user self-disclosures, the social support they receive, and notable differences in discloser user activity. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: EMNLP 2022 Main Conference; see Github https://github.com/skychwang/music-emotions

arXiv:2208.01508 [pdf, other]

doi 10.1145/3583566

COMET: Coverage-guided Model Generation For Deep Learning Library Testing

Authors: Meiziniu Li, Jialun Cao, Yongqiang Tian, Tsz On Li, Ming Wen, Shing-Chi Cheung

Abstract: Recent deep learning (DL) applications are mostly built on top of DL libraries. The quality assurance of these libraries is critical to the dependable deployment of DL applications. Techniques have been proposed to generate various DL models and apply them to test these libraries. However, their test effectiveness is constrained by the diversity of layer API calls in their generated DL models. Our… ▽ More Recent deep learning (DL) applications are mostly built on top of DL libraries. The quality assurance of these libraries is critical to the dependable deployment of DL applications. Techniques have been proposed to generate various DL models and apply them to test these libraries. However, their test effectiveness is constrained by the diversity of layer API calls in their generated DL models. Our study reveals that these techniques can cover at most 34.1% layer inputs, 25.9% layer parameter values, and 15.6% layer sequences. As a result, we find that many bugs arising from specific layer API calls (i.e., specific layer inputs, parameter values, or layer sequences) can be missed by existing techniques. Because of this limitation, we propose COMET to effectively generate DL models with diverse layer API calls for DL library testing. COMET: (1) designs a set of mutation operators and a coverage-based search algorithm to diversify layer inputs, layer parameter values, and layer sequences in DL models. (2) proposes a model synthesis method to boost the test efficiency without compromising the layer API call diversity. Our evaluation result shows that COMET outperforms baselines by covering twice as many layer inputs (69.7% vs. 34.1%), layer parameter values (50.2% vs. 25.9%), and layer sequences (39.0% vs. 15.6%) as those by the state-of-the-art. Moreover, COMET covers 3.4% more library branches than those by existing techniques. Finally, COMET detects 32 new bugs in the latest version of eight popular DL libraries, including TensorFlow and MXNet, with 21 of them confirmed by DL library developers and 7 of those confirmed bugs have been fixed by developers. △ Less

Submitted 30 January, 2023; v1 submitted 2 August, 2022; originally announced August 2022.

Comments: 34 pages, 12 figures

ACM Class: D.2.5; I.2.5

arXiv:2201.11125 [pdf, other]

doi 10.1109/TVCG.2023.3261944

SQRQuerier: A Visual Querying Framework for Cross-national Survey Data Recycling

Authors: Yamei Tu, Olga Li, Junpeng Wang, Han-Wei Shen, Przemek Powalko, Irina Tomescu-Dubrow, Kazimierz M. Slomczynski, Spyros Blanas, J. Craig Jenkins

Abstract: Public opinion surveys constitute a powerful tool to study peoples' attitudes and behaviors in comparative perspectives. However, even worldwide surveys provide only partial geographic and time coverage, which hinders comprehensive knowledge production. To broaden the scope of comparison, social scientists turn to ex-post harmonization of variables from datasets that cover similar topics but in di… ▽ More Public opinion surveys constitute a powerful tool to study peoples' attitudes and behaviors in comparative perspectives. However, even worldwide surveys provide only partial geographic and time coverage, which hinders comprehensive knowledge production. To broaden the scope of comparison, social scientists turn to ex-post harmonization of variables from datasets that cover similar topics but in different populations and/or years. The resulting new datasets can be analyzed as a single source, which can be flexibly accessed through many data portals. However, such portals offer little guidance to explore the data in-depth or query data with user-customized needs. As a result, it is still challenging for social scientists to efficiently identify related data for their studies and evaluate their theoretical models based on the sliced data. To overcome them, in the Survey Data Recycling (SDR) international cooperation research project, we propose SDRQuerier and apply it to the harmonized SDR database, which features over two million respondents interviewed in a total of 1,721 national surveys that are part of 22 well-known international projects. We design the SDRQuerier to solve three practical challenges that social scientists routinely face. First, a BERT-based model provides customized data queries through research questions or keywords. Second, we propose a new visual design to showcase the availability of the harmonized data at different levels, thus hel** users decide if empirical data exist to address a given research question. Lastly, SDRQuerier discloses the underlying relational patterns among substantive and methodological variables in the database, to help social scientists rigorously evaluate or even improve their regression models. Through case studies with multiple social scientists in solving their daily challenges, we demonstrated the novelty, effectiveness of SDRQuerier. △ Less

Submitted 25 January, 2022; originally announced January 2022.

Journal ref: IEEE Transactions on Visualization and Computer Graphics Volume: 29, Issue: 6, 01 June 2023 pgs. 2862-2874

arXiv:2109.05797 [pdf, ps, other]

Show Me How To Revise: Improving Lexically Constrained Sentence Generation with XLNet

Authors: Xingwei He, Victor O. K. Li

Abstract: Lexically constrained sentence generation allows the incorporation of prior knowledge such as lexical constraints into the output. This technique has been applied to machine translation, and dialog response generation. Previous work usually used Markov Chain Monte Carlo (MCMC) sampling to generate lexically constrained sentences, but they randomly determined the position to be edited and the actio… ▽ More Lexically constrained sentence generation allows the incorporation of prior knowledge such as lexical constraints into the output. This technique has been applied to machine translation, and dialog response generation. Previous work usually used Markov Chain Monte Carlo (MCMC) sampling to generate lexically constrained sentences, but they randomly determined the position to be edited and the action to be taken, resulting in many invalid refinements. To overcome this challenge, we used a classifier to instruct the MCMC-based models where and how to refine the candidate sentences. First, we developed two methods to create synthetic data on which the pre-trained model is fine-tuned to obtain a reliable classifier. Next, we proposed a two-step approach, "Predict and Revise", for constrained sentence generation. During the predict step, we leveraged the classifier to compute the learned prior for the candidate sentence. During the revise step, we resorted to MCMC sampling to revise the candidate sentence by conducting a sampled action at a sampled position drawn from the learned prior. We compared our proposed models with many strong baselines on two tasks, generating sentences with lexical constraints and text infilling. Experimental results have demonstrated that our proposed model performs much better than the previous work in terms of sentence fluency and diversity. Our code and pre-trained models are available at https://github.com/NLPCode/MCMCXLNet. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: Accepted by AAAI 2021

arXiv:2103.14587 [pdf, other]

Deep-AIR: A Hybrid CNN-LSTM Framework for Air Quality Modeling in Metropolitan Cities

Authors: Yang Han, Qi Zhang, Victor O. K. Li, Jacqueline C. K. Lam

Abstract: Air pollution has long been a serious environmental health challenge, especially in metropolitan cities, where air pollutant concentrations are exacerbated by the street canyon effect and high building density. Whilst accurately monitoring and forecasting air pollution are highly crucial, existing data-driven models fail to fully address the complex interaction between air pollution and urban dyna… ▽ More Air pollution has long been a serious environmental health challenge, especially in metropolitan cities, where air pollutant concentrations are exacerbated by the street canyon effect and high building density. Whilst accurately monitoring and forecasting air pollution are highly crucial, existing data-driven models fail to fully address the complex interaction between air pollution and urban dynamics. Our Deep-AIR, a novel hybrid deep learning framework that combines a convolutional neural network with a long short-term memory network, aims to address this gap to provide fine-grained city-wide air pollution estimation and station-wide forecast. Our proposed framework creates 1x1 convolution layers to strengthen the learning of cross-feature spatial interaction between air pollution and important urban dynamic features, particularly road density, building density/height, and street canyon effect. Using Hong Kong and Bei**g as case studies, Deep-AIR achieves a higher accuracy than our baseline models. Our model attains an accuracy of 67.6%, 77.2%, and 66.1% in fine-grained hourly estimation, 1-hr, and 24-hr air pollution forecast for Hong Kong, and an accuracy of 65.0%, 75.3%, and 63.5% for Bei**g. Our saliency analysis has revealed that for Hong Kong, street canyon and road density are the best estimators for NO2, while meteorology is the best estimator for PM2.5. △ Less

Submitted 25 March, 2021; originally announced March 2021.

arXiv:2103.12910 [pdf, other]

AQEyes: Visual Analytics for Anomaly Detection and Examination of Air Quality Data

Authors: Dongyu Liu, Kalyan Veeramachaneni, Alexander Geiger, Victor O. K. Li, Huamin Qu

Abstract: Anomaly detection plays a key role in air quality analysis by enhancing situational awareness and alerting users to potential hazards. However, existing anomaly detection approaches for air quality analysis have their own limitations regarding parameter selection (e.g., need for extensive domain knowledge), computational expense, general applicability (e.g., require labeled data), interpretability… ▽ More Anomaly detection plays a key role in air quality analysis by enhancing situational awareness and alerting users to potential hazards. However, existing anomaly detection approaches for air quality analysis have their own limitations regarding parameter selection (e.g., need for extensive domain knowledge), computational expense, general applicability (e.g., require labeled data), interpretability, and the efficiency of analysis. Furthermore, the poor quality of collected air quality data (inconsistently formatted and sometimes missing) also increases the difficulty of analysis substantially. In this paper, we systematically formulate design requirements for a system that can solve these limitations and then propose AQEyes, an integrated visual analytics system for efficiently monitoring, detecting, and examining anomalies in air quality data. In particular, we propose a unified end-to-end tunable machine learning pipeline that includes several data pre-processors and featurizers to deal with data quality issues. The pipeline integrates an efficient unsupervised anomaly detection method that works without the use of labeled data and overcomes the limitations of existing approaches. Further, we develop an interactive visualization system to visualize the outputs from the pipeline. The system incorporates a set of novel visualization and interaction designs, allowing analysts to visually examine air quality dynamics and anomalous events in multiple scales and from multiple facets. We demonstrate the performance of this pipeline through a quantitative evaluation and show the effectiveness of the visualization system using qualitative case studies on real-world datasets. △ Less

Submitted 23 March, 2021; originally announced March 2021.

Comments: 11 pages, 6 figures

arXiv:2102.11503 [pdf, other]

Two Sides of Meta-Learning Evaluation: In vs. Out of Distribution

Authors: Amrith Setlur, Oscar Li, Virginia Smith

Abstract: We categorize meta-learning evaluation into two settings: $\textit{in-distribution}$ [ID], in which the train and test tasks are sampled $\textit{iid}$ from the same underlying task distribution, and $\textit{out-of-distribution}$ [OOD], in which they are not. While most meta-learning theory and some FSL applications follow the ID setting, we identify that most existing few-shot classification ben… ▽ More We categorize meta-learning evaluation into two settings: $\textit{in-distribution}$ [ID], in which the train and test tasks are sampled $\textit{iid}$ from the same underlying task distribution, and $\textit{out-of-distribution}$ [OOD], in which they are not. While most meta-learning theory and some FSL applications follow the ID setting, we identify that most existing few-shot classification benchmarks instead reflect OOD evaluation, as they use disjoint sets of train (base) and test (novel) classes for task generation. This discrepancy is problematic because -- as we show on numerous benchmarks -- meta-learning methods that perform better on existing OOD datasets may perform significantly worse in the ID setting. In addition, in the OOD setting, even though current FSL benchmarks seem befitting, our study highlights concerns in 1) reliably performing model selection for a given meta-learning method, and 2) consistently comparing the performance of different methods. To address these concerns, we provide suggestions on how to construct FSL benchmarks to allow for ID evaluation as well as more reliable OOD evaluation. Our work aims to inform the meta-learning community about the importance and distinction of ID vs. OOD evaluation, as well as the subtleties of OOD evaluation with current benchmarks. △ Less

Submitted 27 October, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

arXiv:2102.08504 [pdf, other]

Label Leakage and Protection in Two-party Split Learning

Authors: Oscar Li, Jiankai Sun, Xin Yang, Weihao Gao, Hongyi Zhang, Junyuan Xie, Virginia Smith, Chong Wang

Abstract: Two-party split learning is a popular technique for learning a model across feature-partitioned data. In this work, we explore whether it is possible for one party to steal the private label information from the other party during split training, and whether there are methods that can protect against such attacks. Specifically, we first formulate a realistic threat model and propose a privacy loss… ▽ More Two-party split learning is a popular technique for learning a model across feature-partitioned data. In this work, we explore whether it is possible for one party to steal the private label information from the other party during split training, and whether there are methods that can protect against such attacks. Specifically, we first formulate a realistic threat model and propose a privacy loss metric to quantify label leakage in split learning. We then show that there exist two simple yet effective methods within the threat model that can allow one party to accurately recover private ground-truth labels owned by the other party. To combat these attacks, we propose several random perturbation techniques, including $\texttt{Marvell}$, an approach that strategically finds the structure of the noise perturbation by minimizing the amount of label leakage (measured through our quantification metric) of a worst-case adversary. We empirically demonstrate the effectiveness of our protection techniques against the identified attacks, and show that $\texttt{Marvell}$ in particular has improved privacy-utility tradeoffs relative to baseline approaches. △ Less

Submitted 24 May, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: Accepted to ICLR 2022 (https://openreview.net/forum?id=cOtBRgsf2fO)

arXiv:2011.14048 [pdf, other]

Is Support Set Diversity Necessary for Meta-Learning?

Authors: Amrith Setlur, Oscar Li, Virginia Smith

Abstract: Meta-learning is a popular framework for learning with limited data in which an algorithm is produced by training over multiple few-shot learning tasks. For classification problems, these tasks are typically constructed by sampling a small number of support and query examples from a subset of the classes. While conventional wisdom is that task diversity should improve the performance of meta-learn… ▽ More Meta-learning is a popular framework for learning with limited data in which an algorithm is produced by training over multiple few-shot learning tasks. For classification problems, these tasks are typically constructed by sampling a small number of support and query examples from a subset of the classes. While conventional wisdom is that task diversity should improve the performance of meta-learning, in this work we find evidence to the contrary: we propose a modification to traditional meta-learning approaches in which we keep the support sets fixed across tasks, thus reducing task diversity. Surprisingly, we find that not only does this modification not result in adverse effects, it almost always improves the performance for a variety of datasets and meta-learning methods. We also provide several initial analyses to understand this phenomenon. Our work serves to: (i) more closely investigate the effect of support set construction for the problem of meta-learning, and (ii) suggest a simple, general, and competitive baseline for few-shot learning. △ Less

Submitted 7 October, 2021; v1 submitted 27 November, 2020; originally announced November 2020.

Journal ref: NeurIPS 2020 Workshop on Meta-learning

arXiv:2010.02646 [pdf, other]

On the Sparsity of Neural Machine Translation Models

Authors: Yong Wang, Longyue Wang, Victor O. K. Li, Zhaopeng Tu

Abstract: Modern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources. In response to this problem, we empirically investigate whether the redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different… ▽ More Modern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources. In response to this problem, we empirically investigate whether the redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different datasets and NMT architectures. We show that: 1) the pruned parameters can be rejuvenated to improve the baseline model by up to +0.8 BLEU points; 2) the rejuvenated parameters are reallocated to enhance the ability of modeling low-level lexical information. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: EMNLP 2020

arXiv:2004.09681 [pdf, other]

Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

Authors: Yingruo Fan, Jacqueline C. K. Lam, Victor O. K. Li

Abstract: The intensity estimation of facial action units (AUs) is challenging due to subtle changes in the person's facial appearance. Previous approaches mainly rely on probabilistic models or predefined rules for modeling co-occurrence relationships among AUs, leading to limited generalization. In contrast, we present a new learning framework that automatically learns the latent relationships of AUs via… ▽ More The intensity estimation of facial action units (AUs) is challenging due to subtle changes in the person's facial appearance. Previous approaches mainly rely on probabilistic models or predefined rules for modeling co-occurrence relationships among AUs, leading to limited generalization. In contrast, we present a new learning framework that automatically learns the latent relationships of AUs via establishing semantic correspondences between feature maps. In the heatmap regression-based network, feature maps preserve rich semantic information associated with AU intensities and locations. Moreover, the AU co-occurring pattern can be reflected by activating a set of feature channels, where each channel encodes a specific visual pattern of AU. This motivates us to model the correlation among feature channels, which implicitly represents the co-occurrence relationship of AU intensity levels. Specifically, we introduce a semantic correspondence convolution (SCC) module to dynamically compute the correspondences from deep and low resolution feature maps, and thus enhancing the discriminability of features. The experimental results demonstrate the effectiveness and the superior performance of our method on two benchmark datasets. △ Less

Submitted 20 April, 2020; originally announced April 2020.

Comments: Accepted at AAAI2020

arXiv:1911.09912 [pdf, other]

Go From the General to the Particular: Multi-Domain Translation with Domain Transformation Networks

Authors: Yong Wang, Longyue Wang, Shuming Shi, Victor O. K. Li, Zhaopeng Tu

Abstract: The key challenge of multi-domain translation lies in simultaneously encoding both the general knowledge shared across domains and the particular knowledge distinctive to each domain in a unified model. Previous work shows that the standard neural machine translation (NMT) model, trained on mixed-domain data, generally captures the general knowledge, but misses the domain-specific knowledge. In re… ▽ More The key challenge of multi-domain translation lies in simultaneously encoding both the general knowledge shared across domains and the particular knowledge distinctive to each domain in a unified model. Previous work shows that the standard neural machine translation (NMT) model, trained on mixed-domain data, generally captures the general knowledge, but misses the domain-specific knowledge. In response to this problem, we augment NMT model with additional domain transformation networks to transform the general representations to domain-specific representations, which are subsequently fed to the NMT decoder. To guarantee the knowledge transformation, we also propose two complementary supervision signals by leveraging the power of knowledge distillation and adversarial learning. Experimental results on several language pairs, covering both balanced and unbalanced multi-domain translation, demonstrate the effectiveness and universality of the proposed approach. Encouragingly, the proposed unified model achieves comparable results with the fine-tuning approach that requires multiple models to preserve the particular knowledge. Further analyses reveal that the domain transformation networks successfully capture the domain-specific knowledge as expected. △ Less

Submitted 22 November, 2019; originally announced November 2019.

Comments: AAAI 2020

arXiv:1910.07843 [pdf, ps, other]

doi 10.1109/TWC.2020.3002891

Max-min Fairness of K-user Cooperative Rate-Splitting in MISO Broadcast Channel with User Relaying

Authors: Yijie Mao, Bruno Clerckx, Jian Zhang, Victor O. K. Li, Mohammed Arafah

Abstract: Cooperative Rate-Splitting (CRS) strategy, relying on linearly precoded rate-splitting at the transmitter and opportunistic transmission of the common message by the relaying user, has recently been shown to outperform typical Non-cooperative Rate-Splitting (NRS), Cooperative Non-Orthogonal Multiple Access (C-NOMA) and Space Division Multiple Access (SDMA) in a two-user Multiple Input Single Outpu… ▽ More Cooperative Rate-Splitting (CRS) strategy, relying on linearly precoded rate-splitting at the transmitter and opportunistic transmission of the common message by the relaying user, has recently been shown to outperform typical Non-cooperative Rate-Splitting (NRS), Cooperative Non-Orthogonal Multiple Access (C-NOMA) and Space Division Multiple Access (SDMA) in a two-user Multiple Input Single Output (MISO) Broadcast Channel (BC) with user relaying. In this work, the existing two-user CRS transmission strategy is generalized to the K-user case. We study the problem of jointly optimizing the precoders, message split, time slot allocation, and relaying user scheduling with the objective of maximizing the minimum rate among users. An efficient self-organizing relaying protocol is first proposed followed by a Successive Convex Approximation (SCA)-based algorithm to jointly optimize time slot, precoders and message split. Numerical results show that the worst-case achievable rate achieved by CRS is significantly increased over that of NRS and SDMA in a wide range of network loads and user deployments. Importantly, the proposed SCA-based algorithm dramatically reduces the computational complexity without any rate loss compared with the conventional algorithm in the literature of CRS. Therefore, we conclude that the proposed K-user CRS is more powerful than the existing transmission schemes. △ Less

Submitted 5 August, 2020; v1 submitted 17 October, 2019; originally announced October 2019.

Comments: accepted by IEEE Transactions on Wireless Communications

arXiv:1906.10651 [pdf, other]

Interpretable Image Recognition with Hierarchical Prototypes

Authors: Peter Hase, Chaofan Chen, Oscar Li, Cynthia Rudin

Abstract: Vision models are interpretable when they classify objects on the basis of features that a person can directly understand. Recently, methods relying on visual feature prototypes have been developed for this purpose. However, in contrast to how humans categorize objects, these approaches have not yet made use of any taxonomical organization of class labels. With such an approach, for instance, we m… ▽ More Vision models are interpretable when they classify objects on the basis of features that a person can directly understand. Recently, methods relying on visual feature prototypes have been developed for this purpose. However, in contrast to how humans categorize objects, these approaches have not yet made use of any taxonomical organization of class labels. With such an approach, for instance, we may see why a chimpanzee is classified as a chimpanzee, but not why it was considered to be a primate or even an animal. In this work we introduce a model that uses hierarchically organized prototypes to classify objects at every level in a predefined taxonomy. Hence, we may find distinct explanations for the prediction an image receives at each level of the taxonomy. The hierarchical prototypes enable the model to perform another important task: interpretably classifying images from previously unseen classes at the level of the taxonomy to which they correctly relate, e.g. classifying a hand gun as a weapon, when the only weapons in the training data are rifles. With a subset of ImageNet, we test our model against its counterpart black-box model on two tasks: 1) classification of data from familiar classes, and 2) classification of data from previously unseen classes at the appropriate level in the taxonomy. We find that our model performs approximately as well as its counterpart black-box model while allowing for each classification to be interpreted. △ Less

Submitted 24 August, 2019; v1 submitted 25 June, 2019; originally announced June 2019.

Comments: Published as a full paper at HCOMP 2019

arXiv:1906.01181 [pdf, other]

Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations

Authors: Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O. K. Li

Abstract: Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naive training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which… ▽ More Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naive training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which translates twice using a third language as a pivot. In this work, we address the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences. Inspired by this analysis, we propose to use two simple but effective approaches: (1) decoder pre-training; (2) back-translation. These methods show significant improvement (4~22 BLEU points) over the vanilla zero-shot translation on three challenging multilingual datasets, and achieve similar or better results than the pivot-based approach. △ Less

Submitted 3 June, 2019; originally announced June 2019.

Comments: Accepted by ACL 2019

arXiv:1905.01422 [pdf, other]

An Adaptive Remote Stochastic Gradient Method for Training Neural Networks

Authors: Yushu Chen, Hao **g, Wenlai Zhao, Zhiqiang Liu, Ouyi Li, Liang Qiao, Wei Xue, Guangwen Yang

Abstract: We present the remote stochastic gradient (RSG) method, which computes the gradients at configurable remote observation points, in order to improve the convergence rate and suppress gradient noise at the same time for different curvatures. RSG is further combined with adaptive methods to construct ARSG for acceleration. The method is efficient in computation and memory, and is straightforward to i… ▽ More We present the remote stochastic gradient (RSG) method, which computes the gradients at configurable remote observation points, in order to improve the convergence rate and suppress gradient noise at the same time for different curvatures. RSG is further combined with adaptive methods to construct ARSG for acceleration. The method is efficient in computation and memory, and is straightforward to implement. We analyze the convergence properties by modeling the training process as a dynamic system, which provides a guideline to select the configurable observation factor without grid search. ARSG yields $O(1/\sqrt{T})$ convergence rate in non-convex settings, that can be further improved to $O(\log(T)/T)$ in strongly convex settings. Numerical experiments demonstrate that ARSG achieves both faster convergence and better generalization, compared with popular adaptive methods, such as ADAM, NADAM, AMSGRAD, and RANGER for the tested problems. In particular, for training ResNet-50 on ImageNet, ARSG outperforms ADAM in convergence speed and meanwhile it surpasses SGD in generalization. △ Less

Submitted 6 September, 2020; v1 submitted 3 May, 2019; originally announced May 2019.

Comments: The generalization is improved by modifying the preconditioner. For training ResNet-50 on ImageNet, ARSG outperforms ADAM in convergence speed and meanwhile it surpasses SGD in generalization. We also present a convergence bound in non-convex settings

arXiv:1902.07851 [pdf, ps, other]

Rate-Splitting for Multi-User Multi-Antenna Wireless Information and Power Transfer

Authors: Yijie Mao, Bruno Clerckx, Victor O. K. Li

Abstract: In a multi-user multi-antenna Simultaneous Wireless Information and Power Transfer (SWIPT) network, the transmitter sends information to the Information Receivers (IRs) and energy to Energy Receivers (ERs) concurrently. A conventional approach is based on Multi-User Linear Precoding (MU--LP) where each IR directly decodes the intended stream by fully treating the interference from other IRs and ER… ▽ More In a multi-user multi-antenna Simultaneous Wireless Information and Power Transfer (SWIPT) network, the transmitter sends information to the Information Receivers (IRs) and energy to Energy Receivers (ERs) concurrently. A conventional approach is based on Multi-User Linear Precoding (MU--LP) where each IR directly decodes the intended stream by fully treating the interference from other IRs and ERs as noise. In this paper, we investigate the application of linearly-precoded Rate-Splitting (RS) in Multiple Input Single Output (MISO) SWIPT Broadcast Channel (BC). By splitting the messages of IRs into private and common parts and encoding the common parts into a common stream decoded by all IRs, RS manages the interference dynamically. The precoders are designed such that the Weighted Sum Rate (WSR) of IRs is maximized under the total transmit power constraint and the sum energy constraint for ERs. Numerical results show that the proposed RS-assisted strategy provides a better rate-energy tradeoff in MISO SWIPT BC. Under a sum energy constraint of ERs, RS-assisted strategy achieves better WSR performance of IRs than MU--LP and NOMA in a wide range of IR and ER deployments. Hence, we draw the conclusion that RS is superior for downlink SWIPT networks. △ Less

Submitted 2 July, 2019; v1 submitted 20 February, 2019; originally announced February 2019.

Comments: 5 pages, 3 figures. This is the latest version. The typos in the version accepted by SPAWC 2019 has been revised

arXiv:1808.08437 [pdf, other]

Meta-Learning for Low-Resource Neural Machine Translation

Authors: Jiatao Gu, Yong Wang, Yun Chen, Kyunghyun Cho, Victor O. K. Li

Abstract: In this paper, we propose to extend the recently introduced model-agnostic meta-learning algorithm (MAML) for low-resource neural machine translation (NMT). We frame low-resource translation as a meta-learning problem, and we learn to adapt to low-resource languages based on multilingual high-resource language tasks. We use the universal lexical representation~\citep{gu2018universal} to overcome t… ▽ More In this paper, we propose to extend the recently introduced model-agnostic meta-learning algorithm (MAML) for low-resource neural machine translation (NMT). We frame low-resource translation as a meta-learning problem, and we learn to adapt to low-resource languages based on multilingual high-resource language tasks. We use the universal lexical representation~\citep{gu2018universal} to overcome the input-output mismatch across different languages. We evaluate the proposed meta-learning strategy using eighteen European languages (Bg, Cs, Da, De, El, Es, Et, Fr, Hu, It, Lt, Nl, Pl, Pt, Sk, Sl, Sv and Ru) as source tasks and five diverse languages (Ro, Lv, Fi, Tr and Ko) as target tasks. We show that the proposed approach significantly outperforms the multilingual, transfer learning based approach~\citep{zoph2016transfer} and enables us to train a competitive NMT system with only a fraction of training examples. For instance, the proposed approach can achieve as high as 22.04 BLEU on Romanian-English WMT'16 by seeing only 16,000 translated words (~600 parallel sentences). △ Less

Submitted 25 August, 2018; originally announced August 2018.

Comments: Accepted as a full paper at EMNLP 2018

arXiv:1808.08325 [pdf, ps, other]

Rate-Splitting for Multi-Antenna Non-Orthogonal Unicast and Multicast Transmission: Spectral and Energy Efficiency Analysis

Authors: Yijie Mao, Bruno Clerckx, Victor O. K. Li

Abstract: In a Non-Orthogonal Unicast and Multicast (NOUM) transmission system, a multicast stream intended to all the receivers is superimposed in the power domain on the unicast streams. One layer of Successive Interference Cancellation (SIC) is required at each receiver to remove the multicast stream before decoding its intended unicast stream. In this paper, we first show that a linearly-precoded 1-laye… ▽ More In a Non-Orthogonal Unicast and Multicast (NOUM) transmission system, a multicast stream intended to all the receivers is superimposed in the power domain on the unicast streams. One layer of Successive Interference Cancellation (SIC) is required at each receiver to remove the multicast stream before decoding its intended unicast stream. In this paper, we first show that a linearly-precoded 1-layer Rate-Splitting (RS) strategy at the transmitter can efficiently exploit this existing SIC receiver architecture. We further propose multi-layer transmission strategies based on the generalized RS and power-domain Non-Orthogonal Multiple Access (NOMA). Two different objectives are studied for the design of the precoders, namely, maximizing the Weighted Sum Rate (WSR) of the unicast messages and maximizing the system Energy Efficiency (EE), both subject to Quality of Service (QoS) rate requirements of all the messages and a sum power constraint. A Weighted Minimum Mean Square Error (WMMSE)-based algorithm and a Successive Convex Approximation (SCA)-based algorithm are proposed to solve the WSR and EE problems, respectively. Numerical results show that the proposed RS-assisted NOUM transmission strategies are more spectrally and energy efficient than the conventional Multi-User Linear-Precoding (MU-LP), Orthogonal Multiple Access (OMA) and power-domain NOMA in a wide range of user deployments (with a diversity of channel directions, channel strengths and qualities of channel state information at the transmitter) and network loads (underloaded and overloaded regimes). It is superior for the downlink multi-antenna NOUM transmission. △ Less

Submitted 19 September, 2019; v1 submitted 24 August, 2018; originally announced August 2018.

Comments: Accepted by IEEE Transaction on Communications

arXiv:1808.02252 [pdf, other]

Efficient and DoS-resistant Consensus for Permissioned Blockchains

Authors: Xusheng Chen, Shixiong Zhao, Ji Qi, Jianyu Jiang, Haoze Song, Cheng Wang, Tsz On Li, T. -H. Hubert Chan, Fengwei Zhang, Xiapu Luo, Sen Wang, Gong Zhang, Heming Cui

Abstract: Existing permissioned blockchain systems designate a fixed and explicit group of committee nodes to run a consensus protocol that confirms the same sequence of blocks among all nodes. Unfortunately, when such a permissioned blockchain runs in a large scale on the Internet, these explicit committee nodes can be easily turned down by denial-of-service (DoS) or network partition attacks. Although wor… ▽ More Existing permissioned blockchain systems designate a fixed and explicit group of committee nodes to run a consensus protocol that confirms the same sequence of blocks among all nodes. Unfortunately, when such a permissioned blockchain runs in a large scale on the Internet, these explicit committee nodes can be easily turned down by denial-of-service (DoS) or network partition attacks. Although work proposes scalable BFT protocols that run on a larger number of committee nodes, their efficiency drops dramatically when only a small number of nodes are attacked. In this paper, our EGES protocol leverages Intel SGX to develop a new abstraction called "stealth committee", which effectively hides the committee nodes into a large pool of fake committee nodes. EGES selects a distinct group of stealth committee for each block and confirms the same sequence of blocks among all nodes with overwhelming probability. Evaluation on typical geo-distributed settings shows that: (1)EGES is the first permissioned blockchain's consensus protocol that can tolerate tough DoS and network partition attacks; and (2) EGES achieves comparable throughput and latency as existing permissioned blockchains' protocols △ Less

Submitted 14 December, 2020; v1 submitted 7 August, 2018; originally announced August 2018.

arXiv:1807.10575 [pdf]

Multi-Region Ensemble Convolutional Neural Network for Facial Expression Recognition

Authors: Yingruo Fan, Jacqueline C. K. Lam, Victor O. K. Li

Abstract: Facial expressions play an important role in conveying the emotional states of human beings. Recently, deep learning approaches have been applied to image recognition field due to the discriminative power of Convolutional Neural Network (CNN). In this paper, we first propose a novel Multi-Region Ensemble CNN (MRE-CNN) framework for facial expression recognition, which aims to enhance the learning… ▽ More Facial expressions play an important role in conveying the emotional states of human beings. Recently, deep learning approaches have been applied to image recognition field due to the discriminative power of Convolutional Neural Network (CNN). In this paper, we first propose a novel Multi-Region Ensemble CNN (MRE-CNN) framework for facial expression recognition, which aims to enhance the learning power of CNN models by capturing both the global and the local features from multiple human face sub-regions. Second, the weighted prediction scores from each sub-network are aggregated to produce the final prediction of high accuracy. Third, we investigate the effects of different sub-regions of the whole face on facial expression recognition. Our proposed method is evaluated based on two well-known publicly available facial expression databases: AFEW 7.0 and RAF-DB, and has been shown to achieve the state-of-the-art recognition accuracy. △ Less

Submitted 11 July, 2018; originally announced July 2018.

Comments: 10pages, 5 figures, Accepted by ICANN 2018

arXiv:1807.02872 [pdf, other]

Large Margin Few-Shot Learning

Authors: Yong Wang, Xiao-Ming Wu, Qimai Li, Jiatao Gu, Wangmeng Xiang, Lei Zhang, Victor O. K. Li

Abstract: The key issue of few-shot learning is learning to generalize. This paper proposes a large margin principle to improve the generalization capacity of metric based methods for few-shot learning. To realize it, we develop a unified framework to learn a more discriminative metric space by augmenting the classification loss function with a large margin distance loss function for training. Extensive exp… ▽ More The key issue of few-shot learning is learning to generalize. This paper proposes a large margin principle to improve the generalization capacity of metric based methods for few-shot learning. To realize it, we develop a unified framework to learn a more discriminative metric space by augmenting the classification loss function with a large margin distance loss function for training. Extensive experiments on two state-of-the-art few-shot learning methods, graph neural networks and prototypical networks, show that our method can improve the performance of existing models substantially with very little computational overhead, demonstrating the effectiveness of the large margin principle and the potential of our method. △ Less

Submitted 21 September, 2018; v1 submitted 8 July, 2018; originally announced July 2018.

Comments: 17 pages, 5 figures, 7 tables

arXiv:1806.10574 [pdf, other]

This Looks Like That: Deep Learning for Interpretable Image Recognition

Authors: Chaofan Chen, Oscar Li, Chaofan Tao, Alina Jade Barnett, Jonathan Su, Cynthia Rudin

Abstract: When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way: the networ… ▽ More When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way: the network dissects the image by finding prototypical parts, and combines evidence from the prototypes to make a final classification. The model thus reasons in a way that is qualitatively similar to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks. The network uses only image-level labels for training without any annotations for parts of images. We demonstrate our method on the CUB-200-2011 dataset and the Stanford Cars dataset. Our experiments show that ProtoPNet can achieve comparable accuracy with its analogous non-interpretable counterpart, and when several ProtoPNets are combined into a larger network, it can achieve an accuracy that is on par with some of the best-performing deep models. Moreover, ProtoPNet provides a level of interpretability that is absent in other interpretable deep models. △ Less

Submitted 28 December, 2019; v1 submitted 27 June, 2018; originally announced June 2018.

Comments: Chaofan Chen and Oscar Li contributed equally to this work. This work has been accepted for spotlight presentation (top 3% of papers) at NeurIPS 2019

Journal ref: Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

arXiv:1804.10516 [pdf, ps, other]

Rate-Splitting Multiple Access for Coordinated Multi-Point Joint Transmission

Authors: Yijie Mao, Bruno Clerckx, Victor O. K. Li

Abstract: As a promising downlink multiple access scheme, Rate-Splitting Multiple Access (RSMA) has been shown to achieve superior spectral and energy efficiencies compared with Space-Division Multiple Access (SDMA) and Non-Orthogonal Multiple Access (NOMA) in downlink single-cell systems. By relying on linearly precoded rate-splitting at the transmitter and successive interference cancellation at the recei… ▽ More As a promising downlink multiple access scheme, Rate-Splitting Multiple Access (RSMA) has been shown to achieve superior spectral and energy efficiencies compared with Space-Division Multiple Access (SDMA) and Non-Orthogonal Multiple Access (NOMA) in downlink single-cell systems. By relying on linearly precoded rate-splitting at the transmitter and successive interference cancellation at the receivers, RSMA has the capability of partially decoding the interference and partially treating the interference as noise, and therefore copes with a wide range of user deployments and network loads. In this work, we further study RSMA in downlink Coordinated Multi-Point (CoMP) Joint Transmission (JT) networks by investigating the optimal beamformer design to maximize the Weighted Sum-Rate (WSR) of all users subject to individual Quality of Service (QoS) rate constraints and per base station power constraints. Numerical results show that, in CoMP JT, RSMA achieves significant WSR improvement over SDMA and NOMA in a wide range of inter-user and inter-cell channel strength disparities. Specifically, SDMA (resp. NOMA) is more suited to deployments with little (resp. large) inter-user channel strength disparity and large (resp. little) inter-cell channel disparity, while RSMA is suited to any deployment. We conclude that RSMA provides rate, robustness and QoS enhancements over SDMA and NOMA in CoMP JT networks. △ Less

Submitted 16 January, 2019; v1 submitted 27 April, 2018; originally announced April 2018.

Comments: 6 pages, 6 sigures

arXiv:1804.08330 [pdf, ps, other]

doi 10.1109/ISWCS.2018.8491100

Energy Efficiency of Rate-Splitting Multiple Access, and Performance Benefits over SDMA and NOMA

Authors: Yijie Mao, Bruno Clerckx, Victor O. K. Li

Abstract: Rate-Splitting Multiple Access (RSMA) is a general and powerful multiple access framework for downlink multi-antenna systems, and contains Space-Division Multiple Access (SDMA) and Non-Orthogonal Multiple Access (NOMA) as special cases. RSMA relies on linearly precoded rate-splitting with Successive Interference Cancellation (SIC) to decode part of the interference and treat the remaining part of… ▽ More Rate-Splitting Multiple Access (RSMA) is a general and powerful multiple access framework for downlink multi-antenna systems, and contains Space-Division Multiple Access (SDMA) and Non-Orthogonal Multiple Access (NOMA) as special cases. RSMA relies on linearly precoded rate-splitting with Successive Interference Cancellation (SIC) to decode part of the interference and treat the remaining part of the interference as noise. Recently, RSMA has been shown to outperform both SDMA and NOMA rate-wise in a wide range of network loads (underloaded and overloaded regimes) and user deployments (with a diversity of channel directions, channel strengths and qualities of Channel State Information at the Transmitter). Moreover, RSMA was shown to provide spectral efficiency and QoS enhancements over NOMA at a lower computational complexity for the transmit scheduler and the receivers. In this paper, we build upon those results and investigate the energy efficiency of RSMA compared to SDMA and NOMA. Considering a multiple-input single-output broadcast channel, we show that RSMA is more energy-efficient than SDMA and NOMA in a wide range of user deployments (with a diversity of channel directions and channel strengths). We conclude that RSMA is more spectrally and energy-efficient than SDMA and NOMA. △ Less

Submitted 21 November, 2018; v1 submitted 23 April, 2018; originally announced April 2018.

Comments: 6 pages, 5 figures

Journal ref: 2018 15th International Symposium on Wireless Communication Systems (ISWCS), Lisbon, 2018

arXiv:1804.07915 [pdf, other]

A Stable and Effective Learning Strategy for Trainable Greedy Decoding

Authors: Yun Chen, Victor O. K. Li, Kyunghyun Cho, Samuel R. Bowman

Abstract: Beam search is a widely used approximate search strategy for neural network decoders, and it generally outperforms simple greedy decoding on tasks like machine translation. However, this improvement comes at substantial computational cost. In this paper, we propose a flexible new method that allows us to reap nearly the full benefits of beam search with nearly no additional computational cost. The… ▽ More Beam search is a widely used approximate search strategy for neural network decoders, and it generally outperforms simple greedy decoding on tasks like machine translation. However, this improvement comes at substantial computational cost. In this paper, we propose a flexible new method that allows us to reap nearly the full benefits of beam search with nearly no additional computational cost. The method revolves around a small neural network actor that is trained to observe and manipulate the hidden state of a previously-trained decoder. To train this actor network, we introduce the use of a pseudo-parallel corpus built using the output of beam search on a base model, ranked by a target quality metric like BLEU. Our method is inspired by earlier work on this problem, but requires no reinforcement learning, and can be trained reliably on a range of models. Experiments on three parallel corpora and three architectures show that the method yields substantial improvements in translation quality and speed over each base system. △ Less

Submitted 27 August, 2018; v1 submitted 21 April, 2018; originally announced April 2018.

Comments: Accepted by EMNLP 2018

arXiv:1802.05567 [pdf, ps, other]

doi 10.1109/SPAWC.2018.8445774

Rate-Splitting for Multi-Antenna Non-Orthogonal Unicast and Multicast Transmission

Authors: Yijie Mao, Bruno Clerckx, Victor O. K. Li

Abstract: In a superimposed unicast and multicast transmission system, one layer of Successive Interference Cancellation (SIC) is required at each receiver to remove the multicast stream before decoding the unicast stream. In this paper, we show that a linearly-precoded Rate-Splitting (RS) strategy at the transmitter can efficiently exploit this existing SIC receiver architecture. By splitting the unicast m… ▽ More In a superimposed unicast and multicast transmission system, one layer of Successive Interference Cancellation (SIC) is required at each receiver to remove the multicast stream before decoding the unicast stream. In this paper, we show that a linearly-precoded Rate-Splitting (RS) strategy at the transmitter can efficiently exploit this existing SIC receiver architecture. By splitting the unicast message into common and private parts and encoding the common parts along with the multicast message into a super-common stream decoded by all users, the SIC is used for the dual purpose of separating the unicast and multicast streams as well as better managing the multi-user interference between the unicast streams. The precoders are designed with the objective of maximizing the Weighted Sum Rate (WSR) of the unicast messages subject to a Quality of Service (QoS) requirement of the multicast message and a sum power constraint. Numerical results show that RS outperforms existing Multi-User Linear-Precoding (MU-LP) and power-domain Non-Orthogonal Multiple Access (NOMA) in a wide range of user deployments (with a diversity of channel directions and channel strengths). Moreover, since one layer of SIC is required to separate the unicast and multicast streams, the performance gain of RS comes without any increase in the receiver complexity compared with MU-LP. Hence, in such non-orthogonal unicast and multicast transmissions, RS provides rate and QoS enhancements at no extra cost for the receivers. △ Less

Submitted 16 February, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

Comments: arXiv admin note: text overlap with arXiv:1710.11018

Journal ref: 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Kalamata, 2018, pp. 1-5

arXiv:1802.05368 [pdf, other]

Universal Neural Machine Translation for Extremely Low Resource Languages

Authors: Jiatao Gu, Hany Hassan, Jacob Devlin, Victor O. K. Li

Abstract: In this paper, we propose a new universal machine translation approach focusing on languages with a limited amount of parallel data. Our proposed approach utilizes a transfer-learning approach to share lexical and sentence level representations across multiple source languages into one target language. The lexical part is shared through a Universal Lexical Representation to support multilingual wo… ▽ More In this paper, we propose a new universal machine translation approach focusing on languages with a limited amount of parallel data. Our proposed approach utilizes a transfer-learning approach to share lexical and sentence level representations across multiple source languages into one target language. The lexical part is shared through a Universal Lexical Representation to support multilingual word-level sharing. The sentence-level sharing is represented by a model of experts from all source languages that share the source encoders with all other languages. This enables the low-resource language to utilize the lexical and sentence representations of the higher resource languages. Our approach is able to achieve 23 BLEU on Romanian-English WMT2016 using a tiny parallel corpus of 6k sentences, compared to the 18 BLEU of strong baseline system which uses multilingual training and back-translation. Furthermore, we show that the proposed approach can achieve almost 20 BLEU on the same dataset through fine-tuning a pre-trained multi-lingual system in a zero-shot setting. △ Less

Submitted 16 April, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

Comments: NAACL-HLT 2018

arXiv:1802.03116 [pdf, other]

Zero-Resource Neural Machine Translation with Multi-Agent Communication Game

Authors: Yun Chen, Yang Liu, Victor O. K. Li

Abstract: While end-to-end neural machine translation (NMT) has achieved notable success in the past years in translating a handful of resource-rich language pairs, it still suffers from the data scarcity problem for low-resource language pairs and domains. To tackle this problem, we propose an interactive multimodal framework for zero-resource neural machine translation. Instead of being passively exposed… ▽ More While end-to-end neural machine translation (NMT) has achieved notable success in the past years in translating a handful of resource-rich language pairs, it still suffers from the data scarcity problem for low-resource language pairs and domains. To tackle this problem, we propose an interactive multimodal framework for zero-resource neural machine translation. Instead of being passively exposed to large amounts of parallel corpora, our learners (implemented as encoder-decoder architecture) engage in cooperative image description games, and thus develop their own image captioning or neural machine translation model from the need to communicate in order to succeed at the game. Experimental results on the IAPR-TC12 and Multi30K datasets show that the proposed learning mechanism significantly improves over the state-of-the-art methods. △ Less

Submitted 8 February, 2018; originally announced February 2018.

Comments: Published at AAAI-18

arXiv:1711.07652 [pdf, other]

A Unified Framework for Wide Area Measurement System Planning

Authors: James J. Q. Yu, Albert Y. S. Lam, David J. Hill, Victor O. K. Li

Abstract: Wide area measurement system (WAMS) is one of the essential components in the future power system. To make WAMS construction plans, practical models of the power network observability, reliability, and underlying communication infrastructures need to be considered. To address this challenging problem, in this paper we propose a unified framework for WAMS planning to cover most realistic concerns i… ▽ More Wide area measurement system (WAMS) is one of the essential components in the future power system. To make WAMS construction plans, practical models of the power network observability, reliability, and underlying communication infrastructures need to be considered. To address this challenging problem, in this paper we propose a unified framework for WAMS planning to cover most realistic concerns in the construction process. The framework jointly optimizes the system construction cost, measurement reliability, and volume of synchrophasor data traffic resulting in a multi-objective optimization problem, which provides multiple Pareto optimal solutions to suit different requirements by the utilities. The framework is verified on two IEEE test systems. The simulation results demonstrate the trade-off relationships among the proposed objectives. Moreover, the proposed framework can develop optimal WAMS plans for full observability with minimal cost. This work develops a comprehensive framework for most practical WAMS construction designs. △ Less

Submitted 21 November, 2017; originally announced November 2017.

arXiv:1711.07651 [pdf, other]

doi 10.1109/ACCESS.2017.2746093

Delay Aware Intelligent Transient Stability Assessment System

Authors: James J. Q. Yu, Albert Y. S. Lam, David J. Hill, Victor O. K. Li

Abstract: Transient stability assessment is a critical tool for power system design and operation. With the emerging advanced synchrophasor measurement techniques, machine learning methods are playing an increasingly important role in power system stability assessment. However, most existing research makes a strong assumption that the measurement data transmission delay is negligible. In this paper, we focu… ▽ More Transient stability assessment is a critical tool for power system design and operation. With the emerging advanced synchrophasor measurement techniques, machine learning methods are playing an increasingly important role in power system stability assessment. However, most existing research makes a strong assumption that the measurement data transmission delay is negligible. In this paper, we focus on investigating the influence of communication delay on synchrophasor-based transient stability assessment. In particular, we develop a delay aware intelligent system to address this issue. By utilizing an ensemble of multiple long short-term memory networks, the proposed system can make early assessments to achieve a much shorter response time by utilizing incomplete system variable measurements. Compared with existing work, our system is able to make accurate assessments with a significantly improved efficiency. We perform numerous case studies to demonstrate the superiority of the proposed intelligent system, in which accurate assessments can be developed with time one third less than state-of-the-art methodologies. Moreover, the simulations indicate that noise in the measurements has trivial impact on the assessment performance, demonstrating the robustness of the proposed system. △ Less

Submitted 21 November, 2017; originally announced November 2017.

arXiv:1711.02281 [pdf, other]

Non-Autoregressive Neural Machine Translation

Authors: Jiatao Gu, James Bradbury, Caiming Xiong, Victor O. K. Li, Richard Socher

Abstract: Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference. Through knowledge distillation, the use of input token fertilities as a latent variable, and policy gradient fine-tuning, we ac… ▽ More Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference. Through knowledge distillation, the use of input token fertilities as a latent variable, and policy gradient fine-tuning, we achieve this at a cost of as little as 2.0 BLEU points relative to the autoregressive Transformer network used as a teacher. We demonstrate substantial cumulative improvements associated with each of the three aspects of our training strategy, and validate our approach on IWSLT 2016 English-German and two WMT language pairs. By sampling fertilities in parallel at inference time, our non-autoregressive model achieves near-state-of-the-art performance of 29.8 BLEU on WMT 2016 English-Romanian. △ Less

Submitted 8 March, 2018; v1 submitted 6 November, 2017; originally announced November 2017.

Comments: Accepted by ICLR 2018

arXiv:1710.11018 [pdf, ps, other]

doi 10.1186/s13638-018-1104-7

Rate-Splitting Multiple Access for Downlink Communication Systems: Bridging, Generalizing and Outperforming SDMA and NOMA

Authors: Yijie Mao, Bruno Clerckx, Victor O. K. Li

Abstract: Space-Division Multiple Access (SDMA) utilizes linear precoding to separate users in the spatial domain and relies on fully treating any residual multi-user interference as noise. Non-Orthogonal Multiple Access (NOMA) uses linearly precoded superposition coding with successive interference cancellation (SIC) and relies on user grou** and ordering to enforce some users to fully decode and cancel… ▽ More Space-Division Multiple Access (SDMA) utilizes linear precoding to separate users in the spatial domain and relies on fully treating any residual multi-user interference as noise. Non-Orthogonal Multiple Access (NOMA) uses linearly precoded superposition coding with successive interference cancellation (SIC) and relies on user grou** and ordering to enforce some users to fully decode and cancel interference created by other users. In this paper, we argue that to efficiently cope with the high throughput, heterogeneity of Quality-of-Service (QoS), and massive connectivity requirements of future multi-antenna wireless networks, multiple access design needs to depart from SDMA and NOMA. We develop a novel multiple access framework, called Rate-Splitting Multiple Access (RSMA). RSMA is a more general and powerful multiple access for downlink multi-antenna systems that contains SDMA and NOMA as special cases. RSMA relies on linearly precoded rate-splitting with SIC to decode part of the interference and treat the remaining part of the interference as noise. This capability of RSMA to partially decode interference and partially treat interference as noise enables to softly bridge the two extremes of fully decoding interference and treating interference as noise, and provide room for rate and QoS enhancements, and complexity reduction. The three multiple access schemes are compared and extensive numerical results show that RSMA provides a smooth transition between SDMA and NOMA and outperforms them both in a wide range of network loads (underloaded and overloaded regimes) and user deployments (with a diversity of channel directions, channel strengths and qualities of Channel State Information at the Transmitter). Moreover, RSMA provides rate and QoS enhancements over NOMA at a lower computational complexity for the transmit scheduler and the receivers (number of SIC layers). △ Less

Submitted 17 April, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

Journal ref: EURASIP Journal on Wireless Communications and Networking, vol. 2018, no. 1, p. 133, May 2018

arXiv:1710.04806 [pdf, other]

Deep Learning for Case-Based Reasoning through Prototypes: A Neural Network that Explains Its Predictions

Authors: Oscar Li, Hao Liu, Chaofan Chen, Cynthia Rudin

Abstract: Deep neural networks are widely used for classification. These deep models often suffer from a lack of interpretability -- they are particularly difficult to understand because of their non-linear nature. As a result, neural networks are often treated as "black box" models, and in the past, have been trained purely to optimize the accuracy of predictions. In this work, we create a novel network ar… ▽ More Deep neural networks are widely used for classification. These deep models often suffer from a lack of interpretability -- they are particularly difficult to understand because of their non-linear nature. As a result, neural networks are often treated as "black box" models, and in the past, have been trained purely to optimize the accuracy of predictions. In this work, we create a novel network architecture for deep learning that naturally explains its own reasoning for each prediction. This architecture contains an autoencoder and a special prototype layer, where each unit of that layer stores a weight vector that resembles an encoded training input. The encoder of the autoencoder allows us to do comparisons within the latent space, while the decoder allows us to visualize the learned prototypes. The training objective has four terms: an accuracy term, a term that encourages every prototype to be similar to at least one encoded input, a term that encourages every encoded input to be close to at least one prototype, and a term that encourages faithful reconstruction by the autoencoder. The distances computed in the prototype layer are used as part of the classification process. Since the prototypes are learned during training, the learned network naturally comes with explanations for each prediction, and the explanations are loyal to what the network actually computes. △ Less

Submitted 21 November, 2017; v1 submitted 13 October, 2017; originally announced October 2017.

Comments: The first two authors contributed equally, 8 pages, accepted in AAAI 2018

arXiv:1706.07518 [pdf, other]

Neural Machine Translation with Gumbel-Greedy Decoding

Authors: Jiatao Gu, Daniel Jiwoong Im, Victor O. K. Li

Abstract: Previous neural machine translation models used some heuristic search algorithms (e.g., beam search) in order to avoid solving the maximum a posteriori problem over translation sentences at test time. In this paper, we propose the Gumbel-Greedy Decoding which trains a generative network to predict translation under a trained model. We solve such a problem using the Gumbel-Softmax reparameterizatio… ▽ More Previous neural machine translation models used some heuristic search algorithms (e.g., beam search) in order to avoid solving the maximum a posteriori problem over translation sentences at test time. In this paper, we propose the Gumbel-Greedy Decoding which trains a generative network to predict translation under a trained model. We solve such a problem using the Gumbel-Softmax reparameterization, which makes our generative network differentiable and trainable through standard stochastic gradient methods. We empirically demonstrate that our proposed model is effective for generating sequences of discrete words. △ Less

Submitted 22 June, 2017; originally announced June 2017.

arXiv:1705.07267 [pdf, other]

Search Engine Guided Non-Parametric Neural Machine Translation

Authors: Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O. K. Li

Abstract: In this paper, we extend an attention-based neural machine translation (NMT) model by allowing it to access an entire training set of parallel sentence pairs even after training. The proposed approach consists of two stages. In the first stage--retrieval stage--, an off-the-shelf, black-box search engine is used to retrieve a small subset of sentence pairs from a training set given a source senten… ▽ More In this paper, we extend an attention-based neural machine translation (NMT) model by allowing it to access an entire training set of parallel sentence pairs even after training. The proposed approach consists of two stages. In the first stage--retrieval stage--, an off-the-shelf, black-box search engine is used to retrieve a small subset of sentence pairs from a training set given a source sentence. These pairs are further filtered based on a fuzzy matching score based on edit distance. In the second stage--translation stage--, a novel translation model, called translation memory enhanced NMT (TM-NMT), seamlessly uses both the source sentence and a set of retrieved sentence pairs to perform the translation. Empirical evaluation on three language pairs (En-Fr, En-De, and En-Es) shows that the proposed approach significantly outperforms the baseline approach and the improvement is more significant when more relevant sentence pairs were retrieved. △ Less

Submitted 8 March, 2018; v1 submitted 20 May, 2017; originally announced May 2017.

Comments: Accepted by AAAI 2018

arXiv:1705.00753 [pdf, other]

A Teacher-Student Framework for Zero-Resource Neural Machine Translation

Authors: Yun Chen, Yang Liu, Yong Cheng, Victor O. K. Li

Abstract: While end-to-end neural machine translation (NMT) has made remarkable progress recently, it still suffers from the data scarcity problem for low-resource language pairs and domains. In this paper, we propose a method for zero-resource NMT by assuming that parallel sentences have close probabilities of generating a sentence in a third language. Based on this assumption, our method is able to train… ▽ More While end-to-end neural machine translation (NMT) has made remarkable progress recently, it still suffers from the data scarcity problem for low-resource language pairs and domains. In this paper, we propose a method for zero-resource NMT by assuming that parallel sentences have close probabilities of generating a sentence in a third language. Based on this assumption, our method is able to train a source-to-target NMT model ("student") without parallel corpora available, guided by an existing pivot-to-target NMT model ("teacher") on a source-pivot parallel corpus. Experimental results show that the proposed method significantly improves over a baseline pivot-based model by +3.0 BLEU points across various language pairs. △ Less

Submitted 1 May, 2017; originally announced May 2017.

Comments: Accepted as a long paper by ACL 2017

arXiv:1702.02429 [pdf, other]

Trainable Greedy Decoding for Neural Machine Translation

Authors: Jiatao Gu, Kyunghyun Cho, Victor O. K. Li

Abstract: Recent research in neural machine translation has largely focused on two aspects; neural network architectures and end-to-end learning algorithms. The problem of decoding, however, has received relatively little attention from the research community. In this paper, we solely focus on the problem of decoding given a trained neural machine translation model. Instead of trying to build a new decoding… ▽ More Recent research in neural machine translation has largely focused on two aspects; neural network architectures and end-to-end learning algorithms. The problem of decoding, however, has received relatively little attention from the research community. In this paper, we solely focus on the problem of decoding given a trained neural machine translation model. Instead of trying to build a new decoding algorithm for any specific decoding objective, we propose the idea of trainable decoding algorithm in which we train a decoding algorithm to find a translation that maximizes an arbitrary decoding objective. More specifically, we design an actor that observes and manipulates the hidden state of the neural machine translation decoder and propose to train it using a variant of deterministic policy gradient. We extensively evaluate the proposed algorithm using four language pairs and two decoding objectives and show that we can indeed train a trainable greedy decoder that generates a better translation (in terms of a target decoding objective) with minimal computational overhead. △ Less

Submitted 8 February, 2017; originally announced February 2017.

Comments: 10 pages

arXiv:1612.05506 [pdf, other]

Cache-Enabled Heterogeneous Cellular Networks: Optimal Tier-Level Content Placement

Authors: Juan Wen, Kaibin Huang, Sheng Yang, Victor O. K. Li

Abstract: Caching popular contents at base stations (BSs) of a heterogeneous cellular network (HCN) avoids frequent information passage from content providers to the network edge, thereby reducing latency and alleviating traffic congestion in backhaul links. In general, the optimal strategies for content placement in HCNs remain largely unknown and deriving them forms the theme of this paper. To this end, w… ▽ More Caching popular contents at base stations (BSs) of a heterogeneous cellular network (HCN) avoids frequent information passage from content providers to the network edge, thereby reducing latency and alleviating traffic congestion in backhaul links. In general, the optimal strategies for content placement in HCNs remain largely unknown and deriving them forms the theme of this paper. To this end, we adopt the popular random HCN model where $K$ tiers of BSs are modelled as independent Poisson point processes distributed in the plane with different densities. Further, the random caching scheme is considered where each of a given set of $M$ files with corresponding popularity measures is placed at each BS of a particular tier with a corresponding probability, called placement probability. The probabilities are identical for all BSs in the same tier but vary over tiers, giving the name tier-level content placement. We consider the network performance metric, hit probability, defined as the probability that a file requested by the typical user is delivered successfully to the user. We maximize the hit probability over content placement probabilities, which yields the optimal tier-level placement policies. For the case of uniform received signal-to-interference thresholds for successful transmissions for BSs in different tiers, the policy is in closed-form where the placement probability for a particular file is proportional to the square-root of the corresponding popularity measure with an offset depending on BS caching capacities. For the general case of non-uniform SIR thresholds, the optimization problem is non-convex and a sub-optimal placement policy is designed by approximation, which has a similar structure as in the case of uniform SIR thresholds and shown by simulation to be close-to-optimal. △ Less

Submitted 17 June, 2017; v1 submitted 16 December, 2016; originally announced December 2016.

Comments: 15 pages, 7 figures

arXiv:1610.07045 [pdf, other]

pg-Causality: Identifying Spatiotemporal Causal Pathways for Air Pollutants with Urban Big Data

Authors: Julie Yixuan Zhu, Chao Zhang, Huichu Zhang, Shi Zhi, Victor O. K. Li, Jiawei Han, Yu Zheng

Abstract: Many countries are suffering from severe air pollution. Understanding how different air pollutants accumulate and propagate is critical to making relevant public policies. In this paper, we use urban big data (air quality data and meteorological data) to identify the \emph{spatiotemporal (ST) causal pathways} for air pollutants. This problem is challenging because: (1) there are numerous noisy and… ▽ More Many countries are suffering from severe air pollution. Understanding how different air pollutants accumulate and propagate is critical to making relevant public policies. In this paper, we use urban big data (air quality data and meteorological data) to identify the \emph{spatiotemporal (ST) causal pathways} for air pollutants. This problem is challenging because: (1) there are numerous noisy and low-pollution periods in the raw air quality data, which may lead to unreliable causality analysis, (2) for large-scale data in the ST space, the computational complexity of constructing a causal structure is very high, and (3) the \emph{ST causal pathways} are complex due to the interactions of multiple pollutants and the influence of environmental factors. Therefore, we present \emph{p-Causality}, a novel pattern-aided causality analysis approach that combines the strengths of \emph{pattern mining} and \emph{Bayesian learning} to efficiently and faithfully identify the \emph{ST causal pathways}. First, \emph{Pattern mining} helps suppress the noise by capturing frequent evolving patterns (FEPs) of each monitoring sensor, and greatly reduce the complexity by selecting the pattern-matched sensors as "causers". Then, \emph{Bayesian learning} carefully encodes the local and ST causal relations with a Gaussian Bayesian network (GBN)-based graphical model, which also integrates environmental influences to minimize biases in the final results. We evaluate our approach with three real-world data sets containing 982 air quality sensors, in three regions of China from 01-Jun-2013 to 19-Dec-2015. Results show that our approach outperforms the traditional causal structure learning methods in time efficiency, inference accuracy and interpretability. △ Less

Submitted 18 April, 2018; v1 submitted 22 October, 2016; originally announced October 2016.

arXiv:1610.00388 [pdf, other]

Learning to Translate in Real-time with Neural Machine Translation

Authors: Jiatao Gu, Graham Neubig, Kyunghyun Cho, Victor O. K. Li

Abstract: Translating in real-time, a.k.a. simultaneous translation, outputs translation words before the input sentence ends, which is a challenging problem for conventional machine translation methods. We propose a neural machine translation (NMT) framework for simultaneous translation in which an agent learns to make decisions on when to translate from the interaction with a pre-trained NMT environment.… ▽ More Translating in real-time, a.k.a. simultaneous translation, outputs translation words before the input sentence ends, which is a challenging problem for conventional machine translation methods. We propose a neural machine translation (NMT) framework for simultaneous translation in which an agent learns to make decisions on when to translate from the interaction with a pre-trained NMT environment. To trade off quality and delay, we extensively explore various targets for delay and design a method for beam-search applicable in the simultaneous MT setting. Experiments against state-of-the-art baselines on two language pairs demonstrate the efficacy of the proposed framework both quantitatively and qualitatively. △ Less

Submitted 10 January, 2017; v1 submitted 2 October, 2016; originally announced October 2016.

Comments: 10 pages, camera ready

arXiv:1603.06393 [pdf, other]

Incorporating Copying Mechanism in Sequence-to-Sequence Learning

Authors: Jiatao Gu, Zhengdong Lu, Hang Li, Victor O. K. Li

Abstract: We address an important problem in sequence-to-sequence (Seq2Seq) learning referred to as copying, in which certain segments in the input sequence are selectively replicated in the output sequence. A similar phenomenon is observable in human language communication. For example, humans tend to repeat entity names or even long phrases in conversation. The challenge with regard to copying in Seq2Seq… ▽ More We address an important problem in sequence-to-sequence (Seq2Seq) learning referred to as copying, in which certain segments in the input sequence are selectively replicated in the output sequence. A similar phenomenon is observable in human language communication. For example, humans tend to repeat entity names or even long phrases in conversation. The challenge with regard to copying in Seq2Seq is that new machinery is needed to decide when to perform the operation. In this paper, we incorporate copying into neural network-based Seq2Seq learning and propose a new model called CopyNet with encoder-decoder structure. CopyNet can nicely integrate the regular way of word generation in the decoder with the new copying mechanism which can choose sub-sequences in the input sequence and put them at proper places in the output sequence. Our empirical study on both synthetic data sets and real world data sets demonstrates the efficacy of CopyNet. For example, CopyNet can outperform regular RNN-based model with remarkable margins on text summarization tasks. △ Less

Submitted 8 June, 2016; v1 submitted 21 March, 2016; originally announced March 2016.

Comments: 10 pages, 5 figures, accepted by ACL2016

Showing 1–50 of 66 results for author: Li, O