Search | arXiv e-print repository

Locally Private Estimation with Public Features

Authors: Yuheng Ma, Ke Jia, Hanfang Yang

Abstract: We initiate the study of locally differentially private (LDP) learning with public features. We define semi-feature LDP, where some features are publicly available while the remaining ones, along with the label, require protection under local differential privacy. Under semi-feature LDP, we demonstrate that the mini-max convergence rate for non-parametric regression is significantly reduced compar… ▽ More We initiate the study of locally differentially private (LDP) learning with public features. We define semi-feature LDP, where some features are publicly available while the remaining ones, along with the label, require protection under local differential privacy. Under semi-feature LDP, we demonstrate that the mini-max convergence rate for non-parametric regression is significantly reduced compared to that of classical LDP. Then we propose HistOfTree, an estimator that fully leverages the information contained in both public and private features. Theoretically, HistOfTree reaches the mini-max optimal convergence rate. Empirically, HistOfTree achieves superior performance on both synthetic and real data. We also explore scenarios where users have the flexibility to select features for protection manually. In such cases, we propose an estimator and a data-driven parameter tuning strategy, leading to analogous theoretical and empirical results. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2103.03460 [pdf, other]

doi 10.1016/j.patcog.2021.107907

Vicinal and categorical domain adaptation

Authors: Hui Tang, Kui Jia

Abstract: Unsupervised domain adaptation aims to learn a task classifier that performs well on the unlabeled target domain, by utilizing the labeled source domain. Inspiring results have been acquired by learning domain-invariant deep features via domain-adversarial training. However, its parallel design of task and domain classifiers limits the ability to achieve a finer category-level domain alignment. To… ▽ More Unsupervised domain adaptation aims to learn a task classifier that performs well on the unlabeled target domain, by utilizing the labeled source domain. Inspiring results have been acquired by learning domain-invariant deep features via domain-adversarial training. However, its parallel design of task and domain classifiers limits the ability to achieve a finer category-level domain alignment. To promote categorical domain adaptation (CatDA), based on a joint category-domain classifier, we propose novel losses of adversarial training at both domain and category levels. Since the joint classifier can be regarded as a concatenation of individual task classifiers respectively for the two domains, our design principle is to enforce consistency of category predictions between the two task classifiers. Moreover, we propose a concept of vicinal domains whose instances are produced by a convex combination of pairs of instances respectively from the two domains. Intuitively, alignment of the possibly infinite number of vicinal domains enhances that of original domains. We propose novel adversarial losses for vicinal domain adaptation (VicDA) based on CatDA, leading to Vicinal and Categorical Domain Adaptation (ViCatDA). We also propose Target Discriminative Structure Recovery (TDSR) to recover the intrinsic target discrimination damaged by adversarial feature alignment. We also analyze the principles underlying the ability of our key designs to align the joint distributions. Extensive experiments on several benchmark datasets demonstrate that we achieve the new state of the art. △ Less

Submitted 4 March, 2021; originally announced March 2021.

Comments: Accepted by Pattern Recognition

Journal ref: Pattern Recognition, Volume 115, July 2021, 107907

arXiv:2102.05298 [pdf, other]

Inductive Granger Causal Modeling for Multivariate Time Series

Authors: Yunfei Chu, Xiaowei Wang, Jianxin Ma, Kunyang Jia, **gren Zhou, Hongxia Yang

Abstract: Granger causal modeling is an emerging topic that can uncover Granger causal relationship behind multivariate time series data. In many real-world systems, it is common to encounter a large amount of multivariate time series data collected from different individuals with sharing commonalities. However, there are ongoing concerns regarding Granger causality's applicability in such large scale compl… ▽ More Granger causal modeling is an emerging topic that can uncover Granger causal relationship behind multivariate time series data. In many real-world systems, it is common to encounter a large amount of multivariate time series data collected from different individuals with sharing commonalities. However, there are ongoing concerns regarding Granger causality's applicability in such large scale complex scenarios, presenting both challenges and opportunities for Granger causal structure reconstruction. Existing methods usually train a distinct model for each individual, suffering from inefficiency and over-fitting issues. To bridge this gap, we propose an Inductive GRanger cAusal modeling (InGRA) framework for inductive Granger causality learning and common causal structure detection on multivariate time series, which exploits the shared commonalities underlying the different individuals. In particular, we train one global model for individuals with different Granger causal structures through a novel attention mechanism, called prototypical Granger causal attention. The model can detect common causal structures for different individuals and infer Granger causal structures for newly arrived individuals. Extensive experiments, as well as an online A/B test on an E-commercial advertising platform, demonstrate the superior performances of InGRA. △ Less

Submitted 10 February, 2021; originally announced February 2021.

Comments: 6 pages, 6 figures

arXiv:2012.04280 [pdf, other]

Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain Adaptation using Structurally Regularized Deep Clustering

Authors: Hui Tang, Xiatian Zhu, Ke Chen, Kui Jia, C. L. Philip Chen

Abstract: Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain, given labeled data on a source domain whose distribution diverges from the target one. Mainstream UDA methods strive to learn domain-aligned features such that classifiers trained on the source features can be readily applied to the target ones. Although impressive re… ▽ More Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain, given labeled data on a source domain whose distribution diverges from the target one. Mainstream UDA methods strive to learn domain-aligned features such that classifiers trained on the source features can be readily applied to the target ones. Although impressive results have been achieved, these methods have a potential risk of damaging the intrinsic data structures of target discrimination, raising an issue of generalization particularly for UDA tasks in an inductive setting. To address this issue, we are motivated by a UDA assumption of structural similarity across domains, and propose to directly uncover the intrinsic target discrimination via constrained clustering, where we constrain the clustering solutions using structural source regularization that hinges on the very same assumption. Technically, we propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one, and we thus term our method as H-SRDC. Our hybrid model is based on a deep clustering framework that minimizes the Kullback-Leibler divergence between the distribution of network prediction and an auxiliary one, where we impose structural regularization by learning domain-shared classifier and cluster centroids. By enriching the structural similarity assumption, we are able to extend H-SRDC for a pixel-level UDA task of semantic segmentation. We conduct extensive experiments on seven UDA benchmarks of image classification and semantic segmentation. With no explicit feature alignment, our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings. We make our implementation codes publicly available at https://github.com/huitangtang/H-SRDC. △ Less

Submitted 7 April, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: Journal extension of our preliminary CVPR conference paper, under review, 16 pages, 8 figures, 9 tables

arXiv:2011.07478 [pdf, other]

Towards Understanding the Regularization of Adversarial Robustness on Neural Networks

Authors: Yuxin Wen, Shuai Li, Kui Jia

Abstract: The problem of adversarial examples has shown that modern Neural Network (NN) models could be rather fragile. Among the more established techniques to solve the problem, one is to require the model to be {\it $ε$-adversarially robust} (AR); that is, to require the model not to change predicted labels when any given input examples are perturbed within a certain range. However, it is observed that s… ▽ More The problem of adversarial examples has shown that modern Neural Network (NN) models could be rather fragile. Among the more established techniques to solve the problem, one is to require the model to be {\it $ε$-adversarially robust} (AR); that is, to require the model not to change predicted labels when any given input examples are perturbed within a certain range. However, it is observed that such methods would lead to standard performance degradation, i.e., the degradation on natural examples. In this work, we study the degradation through the regularization perspective. We identify quantities from generalization analysis of NNs; with the identified quantities we empirically find that AR is achieved by regularizing/biasing NNs towards less confident solutions by making the changes in the feature space (induced by changes in the instance space) of most layers smoother uniformly in all directions; so to a certain extent, it prevents sudden change in prediction w.r.t. perturbations. However, the end result of such smoothing concentrates samples around decision boundaries, resulting in less confident solutions, and leads to worse standard performance. Our studies suggest that one might consider ways that build AR into NNs in a gentler way to avoid the problematic regularization. △ Less

Submitted 15 November, 2020; originally announced November 2020.

Comments: Published as a conference paper at ICML 2020

arXiv:2007.07695 [pdf, other]

Label Propagation with Augmented Anchors: A Simple Semi-Supervised Learning baseline for Unsupervised Domain Adaptation

Authors: Yabin Zhang, Bin Deng, Kui Jia, Lei Zhang

Abstract: Motivated by the problem relatedness between unsupervised domain adaptation (UDA) and semi-supervised learning (SSL), many state-of-the-art UDA methods adopt SSL principles (e.g., the cluster assumption) as their learning ingredients. However, they tend to overlook the very domain-shift nature of UDA. In this work, we take a step further to study the proper extensions of SSL techniques for UDA. Ta… ▽ More Motivated by the problem relatedness between unsupervised domain adaptation (UDA) and semi-supervised learning (SSL), many state-of-the-art UDA methods adopt SSL principles (e.g., the cluster assumption) as their learning ingredients. However, they tend to overlook the very domain-shift nature of UDA. In this work, we take a step further to study the proper extensions of SSL techniques for UDA. Taking the algorithm of label propagation (LP) as an example, we analyze the challenges of adopting LP to UDA and theoretically analyze the conditions of affinity graph/matrix construction in order to achieve better propagation of true labels to unlabeled instances. Our analysis suggests a new algorithm of Label Propagation with Augmented Anchors (A$^2$LP), which could potentially improve LP via generation of unlabeled virtual instances (i.e., the augmented anchors) with high-confidence label predictions. To make the proposed A$^2$LP useful for UDA, we propose empirical schemes to generate such virtual instances. The proposed schemes also tackle the domain-shift challenge of UDA by alternating between pseudo labeling via A$^2$LP and domain-invariant feature learning. Experiments show that such a simple SSL extension improves over representative UDA methods of domain-invariant feature learning, and could empower two state-of-the-art methods on benchmark UDA datasets. Our results show the value of further investigation on SSL techniques for UDA problems. △ Less

Submitted 15 July, 2020; originally announced July 2020.

Comments: ECCV2020 spotlight. Investigating SSL techniques for UDA. Codes are available at https://github.com/YBZh/Label-Propagation-with-Augmented-Anchors

Journal ref: ECCV2020

arXiv:2007.07203 [pdf, other]

Deep Retrieval: Learning A Retrievable Structure for Large-Scale Recommendations

Authors: Weihao Gao, Xiangjun Fan, Chong Wang, Jiankai Sun, Kai Jia, Wenzhi Xiao, Ruofan Ding, Xingyan Bin, Hui Yang, Xiaobing Liu

Abstract: One of the core problems in large-scale recommendations is to retrieve top relevant candidates accurately and efficiently, preferably in sub-linear time. Previous approaches are mostly based on a two-step procedure: first learn an inner-product model, and then use some approximate nearest neighbor (ANN) search algorithm to find top candidates. In this paper, we present Deep Retrieval (DR), to lear… ▽ More One of the core problems in large-scale recommendations is to retrieve top relevant candidates accurately and efficiently, preferably in sub-linear time. Previous approaches are mostly based on a two-step procedure: first learn an inner-product model, and then use some approximate nearest neighbor (ANN) search algorithm to find top candidates. In this paper, we present Deep Retrieval (DR), to learn a retrievable structure directly with user-item interaction data (e.g. clicks) without resorting to the Euclidean space assumption in ANN algorithms. DR's structure encodes all candidate items into a discrete latent space. Those latent codes for the candidates are model parameters and learnt together with other neural network parameters to maximize the same objective function. With the model learnt, a beam search over the structure is performed to retrieve the top candidates for reranking. Empirically, we first demonstrate that DR, with sub-linear computational complexity, can achieve almost the same accuracy as the brute-force baseline on two public datasets. Moreover, we show that, in a live production recommendation system, a deployed DR approach significantly outperforms a well-tuned ANN baseline in terms of engagement metrics. To the best of our knowledge, DR is among the first non-ANN algorithms successfully deployed at the scale of hundreds of millions of items for industrial recommendation systems. △ Less

Submitted 18 May, 2021; v1 submitted 12 July, 2020; originally announced July 2020.

Comments: 9 pages, 6 figures

arXiv:2003.14058 [pdf, other]

MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning

Authors: Yuan Gao, Hao** Bai, Zequn Jie, Jiayi Ma, Kui Jia, Wei Liu

Abstract: We propose to incorporate neural architecture search (NAS) into general-purpose multi-task learning (GP-MTL). Existing NAS methods typically define different search spaces according to different tasks. In order to adapt to different task combinations (i.e., task sets), we disentangle the GP-MTL networks into single-task backbones (optionally encode the task priors), and a hierarchical and layerwis… ▽ More We propose to incorporate neural architecture search (NAS) into general-purpose multi-task learning (GP-MTL). Existing NAS methods typically define different search spaces according to different tasks. In order to adapt to different task combinations (i.e., task sets), we disentangle the GP-MTL networks into single-task backbones (optionally encode the task priors), and a hierarchical and layerwise features sharing/fusing scheme across them. This enables us to design a novel and general task-agnostic search space, which inserts cross-task edges (i.e., feature fusion connections) into fixed single-task network backbones. Moreover, we also propose a novel single-shot gradient-based search algorithm that closes the performance gap between the searched architectures and the final evaluation architecture. This is realized with a minimum entropy regularization on the architecture weights during the search phase, which makes the architecture weights converge to near-discrete values and therefore achieves a single model. As a result, our searched model can be directly used for evaluation without (re-)training from scratch. We perform extensive experiments using different single-task backbones on various task sets, demonstrating the promising performance obtained by exploiting the hierarchical and layerwise features, as well as the desirable generalizability to different i) task sets and ii) single-task backbones. The code of our paper is available at https://github.com/bhpfelix/MTLNAS. △ Less

Submitted 31 March, 2020; originally announced March 2020.

Comments: Accepted to CVPR2020. The first two authors contribute equally

Journal ref: IEEE Conference on Computer Vision and Pattern Recognition, 2020

arXiv:2003.03021 [pdf, other]

Exploiting Verified Neural Networks via Floating Point Numerical Error

Authors: Kai Jia, Martin Rinard

Abstract: Researchers have developed neural network verification algorithms motivated by the need to characterize the robustness of deep neural networks. The verifiers aspire to answer whether a neural network guarantees certain properties with respect to all inputs in a space. However, many verifiers inaccurately model floating point arithmetic but do not thoroughly discuss the consequences. We show that… ▽ More Researchers have developed neural network verification algorithms motivated by the need to characterize the robustness of deep neural networks. The verifiers aspire to answer whether a neural network guarantees certain properties with respect to all inputs in a space. However, many verifiers inaccurately model floating point arithmetic but do not thoroughly discuss the consequences. We show that the negligence of floating point error leads to unsound verification that can be systematically exploited in practice. For a pretrained neural network, we present a method that efficiently searches inputs as witnesses for the incorrectness of robustness claims made by a complete verifier. We also present a method to construct neural network architectures and weights that induce wrong results of an incomplete verifier. Our results highlight that, to achieve practically reliable verification of neural networks, any verification system must accurately (or conservatively) model the effects of any floating point computations in the network inference or verification system. △ Less

Submitted 1 October, 2021; v1 submitted 5 March, 2020; originally announced March 2020.

Comments: SAS 2021

arXiv:2002.08681 [pdf, other]

doi 10.1109/TPAMI.2020.3036956

Unsupervised Multi-Class Domain Adaptation: Theory, Algorithms, and Practice

Authors: Yabin Zhang, Bin Deng, Hui Tang, Lei Zhang, Kui Jia

Abstract: In this paper, we study the formalism of unsupervised multi-class domain adaptation (multi-class UDA), which underlies a few recent algorithms whose learning objectives are only motivated empirically. Multi-Class Scoring Disagreement (MCSD) divergence is presented by aggregating the absolute margin violations in multi-class classification, and this proposed MCSD is able to fully characterize the r… ▽ More In this paper, we study the formalism of unsupervised multi-class domain adaptation (multi-class UDA), which underlies a few recent algorithms whose learning objectives are only motivated empirically. Multi-Class Scoring Disagreement (MCSD) divergence is presented by aggregating the absolute margin violations in multi-class classification, and this proposed MCSD is able to fully characterize the relations between any pair of multi-class scoring hypotheses. By using MCSD as a measure of domain distance, we develop a new domain adaptation bound for multi-class UDA; its data-dependent, probably approximately correct bound is also developed that naturally suggests adversarial learning objectives to align conditional feature distributions across source and target domains. Consequently, an algorithmic framework of Multi-class Domain-adversarial learning Networks (McDalNets) is developed, and its different instantiations via surrogate learning objectives either coincide with or resemble a few recently popular methods, thus (partially) underscoring their practical effectiveness. Based on our identical theory for multi-class UDA, we also introduce a new algorithm of Domain-Symmetric Networks (SymmNets), which is featured by a novel adversarial strategy of domain confusion and discrimination. SymmNets affords simple extensions that work equally well under the problem settings of either closed set, partial, or open set UDA. We conduct careful empirical studies to compare different algorithms of McDalNets and our newly introduced SymmNets. Experiments verify our theoretical analysis and show the efficacy of our proposed SymmNets. In addition, we have made our implementation code publicly available. △ Less

Submitted 22 November, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

Comments: CVPR extension; TPAMI camera ready version: https://ieeexplore.ieee.org/document/9253700; IEEE copyright; Codes are available at: https://github.com/YBZh/MultiClassDA

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI),10 November 2020

arXiv:1911.00888 [pdf, other]

Multi-marginal Wasserstein GAN

Authors: Jiezhang Cao, Langyuan Mo, Yifan Zhang, Kui Jia, Chunhua Shen, Mingkui Tan

Abstract: Multiple marginal matching problem aims at learning map**s to match a source domain to multiple target domains and it has attracted great attention in many applications, such as multi-domain image translation. However, addressing this problem has two critical challenges: (i) Measuring the multi-marginal distance among different domains is very intractable; (ii) It is very difficult to exploit cr… ▽ More Multiple marginal matching problem aims at learning map**s to match a source domain to multiple target domains and it has attracted great attention in many applications, such as multi-domain image translation. However, addressing this problem has two critical challenges: (i) Measuring the multi-marginal distance among different domains is very intractable; (ii) It is very difficult to exploit cross-domain correlations to match the target domain distributions. In this paper, we propose a novel Multi-marginal Wasserstein GAN (MWGAN) to minimize Wasserstein distance among domains. Specifically, with the help of multi-marginal optimal transport theory, we develop a new adversarial objective function with inner- and inter-domain constraints to exploit cross-domain correlations. Moreover, we theoretically analyze the generalization performance of MWGAN, and empirically evaluate it on the balanced and imbalanced translation tasks. Extensive experiments on toy and real-world datasets demonstrate the effectiveness of MWGAN. △ Less

Submitted 3 November, 2019; originally announced November 2019.

Comments: This paper is accepted by NeurIPS 2019

arXiv:1908.10611 [pdf, other]

Bayes EMbedding (BEM): Refining Representation by Integrating Knowledge Graphs and Behavior-specific Networks

Authors: Yuting Ye, Xuwu Wang, Jiangchao Yao, Kunyang Jia, **gren Zhou, Yanghua Xiao, Hongxia Yang

Abstract: Low-dimensional embeddings of knowledge graphs and behavior graphs have proved remarkably powerful in varieties of tasks, from predicting unobserved edges between entities to content recommendation. The two types of graphs can contain distinct and complementary information for the same entities/nodes. However, previous works focus either on knowledge graph embedding or behavior graph embedding whi… ▽ More Low-dimensional embeddings of knowledge graphs and behavior graphs have proved remarkably powerful in varieties of tasks, from predicting unobserved edges between entities to content recommendation. The two types of graphs can contain distinct and complementary information for the same entities/nodes. However, previous works focus either on knowledge graph embedding or behavior graph embedding while few works consider both in a unified way. Here we present BEM , a Bayesian framework that incorporates the information from knowledge graphs and behavior graphs. To be more specific, BEM takes as prior the pre-trained embeddings from the knowledge graph, and integrates them with the pre-trained embeddings from the behavior graphs via a Bayesian generative model. BEM is able to mutually refine the embeddings from both sides while preserving their own topological structures. To show the superiority of our method, we conduct a range of experiments on three benchmark datasets: node classification, link prediction, triplet classification on two small datasets related to Freebase, and item recommendation on a large-scale e-commerce dataset. △ Less

Submitted 28 August, 2019; originally announced August 2019.

Comments: 25 pages, 5 figures, 10 tables. CIKM 2019

arXiv:1905.05929 [pdf, other]

Orthogonal Deep Neural Networks

Authors: Kui Jia, Shuai Li, Yuxin Wen, Tongliang Liu, Dacheng Tao

Abstract: In this paper, we introduce the algorithms of Orthogonal Deep Neural Networks (OrthDNNs) to connect with recent interest of spectrally regularized deep learning methods. OrthDNNs are theoretically motivated by generalization analysis of modern DNNs, with the aim to find solution properties of network weights that guarantee better generalization. To this end, we first prove that DNNs are of local i… ▽ More In this paper, we introduce the algorithms of Orthogonal Deep Neural Networks (OrthDNNs) to connect with recent interest of spectrally regularized deep learning methods. OrthDNNs are theoretically motivated by generalization analysis of modern DNNs, with the aim to find solution properties of network weights that guarantee better generalization. To this end, we first prove that DNNs are of local isometry on data distributions of practical interest; by using a new covering of the sample space and introducing the local isometry property of DNNs into generalization analysis, we establish a new generalization error bound that is both scale- and range-sensitive to singular value spectrum of each of networks' weight matrices. We prove that the optimal bound w.r.t. the degree of isometry is attained when each weight matrix has a spectrum of equal singular values, among which orthogonal weight matrix or a non-square one with orthonormal rows or columns is the most straightforward choice, suggesting the algorithms of OrthDNNs. We present both algorithms of strict and approximate OrthDNNs, and for the later ones we propose a simple yet effective algorithm called Singular Value Bounding (SVB), which performs as well as strict OrthDNNs, but at a much lower computational cost. We also propose Bounded Batch Normalization (BBN) to make compatible use of batch normalization with OrthDNNs. We conduct extensive comparative studies by using modern architectures on benchmark image classification. Experiments show the efficacy of OrthDNNs. △ Less

Submitted 15 October, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

Comments: To Appear in IEEE Transactions on Pattern Analysis and Machine Intelligence

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

arXiv:1612.07222 [pdf, other]

Bayesian Decision Process for Cost-Efficient Dynamic Ranking via Crowdsourcing

Authors: Xi Chen, Kevin Jiao, Qihang Lin

Abstract: Rank aggregation based on pairwise comparisons over a set of items has a wide range of applications. Although considerable research has been devoted to the development of rank aggregation algorithms, one basic question is how to efficiently collect a large amount of high-quality pairwise comparisons for the ranking purpose. Because of the advent of many crowdsourcing services, a crowd of workers a… ▽ More Rank aggregation based on pairwise comparisons over a set of items has a wide range of applications. Although considerable research has been devoted to the development of rank aggregation algorithms, one basic question is how to efficiently collect a large amount of high-quality pairwise comparisons for the ranking purpose. Because of the advent of many crowdsourcing services, a crowd of workers are often hired to conduct pairwise comparisons with a small monetary reward for each pair they compare. Since different workers have different levels of reliability and different pairs have different levels of ambiguity, it is desirable to wisely allocate the limited budget for comparisons among the pairs of items and workers so that the global ranking can be accurately inferred from the comparison results. To this end, we model the active sampling problem in crowdsourced ranking as a Bayesian Markov decision process, which dynamically selects item pairs and workers to improve the ranking accuracy under a budget constraint. We further develop a computationally efficient sampling policy based on knowledge gradient as well as a moment matching technique for posterior approximation. Experimental evaluations on both synthetic and real data show that the proposed policy achieves high ranking accuracy with a lower labeling cost. △ Less

Submitted 21 December, 2016; originally announced December 2016.

Journal ref: Journal of Machine Learning Research 17 (2016) 1-40

Showing 1–14 of 14 results for author: Jiao, K