Online and Distributed Robust Regressions under Adversarial Data Corruption

Zhang, Xuchao; Zhao, Liang; Boedihardjo, Arnold P.; Lu, Chang-Tien

Computer Science > Data Structures and Algorithms

arXiv:1710.00904 (cs)

[Submitted on 2 Oct 2017]

Title:Online and Distributed Robust Regressions under Adversarial Data Corruption

Authors:Xuchao Zhang, Liang Zhao, Arnold P. Boedihardjo, Chang-Tien Lu

View PDF

Abstract:In today's era of big data, robust least-squares regression becomes a more challenging problem when considering the adversarial corruption along with explosive growth of datasets. Traditional robust methods can handle the noise but suffer from several challenges when applied in huge dataset including 1) computational infeasibility of handling an entire dataset at once, 2) existence of heterogeneously distributed corruption, and 3) difficulty in corruption estimation when data cannot be entirely loaded. This paper proposes online and distributed robust regression approaches, both of which can concurrently address all the above challenges. Specifically, the distributed algorithm optimizes the regression coefficients of each data block via heuristic hard thresholding and combines all the estimates in a distributed robust consolidation. Furthermore, an online version of the distributed algorithm is proposed to incrementally update the existing estimates with new incoming data. We also prove that our algorithms benefit from strong robustness guarantees in terms of regression coefficient recovery with a constant upper bound on the error of state-of-the-art batch methods. Extensive experiments on synthetic and real datasets demonstrate that our approaches are superior to those of existing methods in effectiveness, with competitive efficiency.

Comments:	Accepted by ICDM 2017
Subjects:	Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1710.00904 [cs.DS]
	(or arXiv:1710.00904v1 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1710.00904

Submission history

From: Xuchao Zhang [view email]
[v1] Mon, 2 Oct 2017 20:55:39 UTC (913 KB)

Computer Science > Data Structures and Algorithms

Title:Online and Distributed Robust Regressions under Adversarial Data Corruption

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Online and Distributed Robust Regressions under Adversarial Data Corruption

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators