Search | arXiv e-print repository

Class Probability Matching Using Kernel Methods for Label Shift Adaptation

Authors: Hongwei Wen, Annika Betken, Hanyuan Hang

Abstract: In domain adaptation, covariate shift and label shift problems are two distinct and complementary tasks. In covariate shift adaptation where the differences in data distribution arise from variations in feature probabilities, existing approaches naturally address this problem based on \textit{feature probability matching} (\textit{FPM}). However, for label shift adaptation where the differences in… ▽ More In domain adaptation, covariate shift and label shift problems are two distinct and complementary tasks. In covariate shift adaptation where the differences in data distribution arise from variations in feature probabilities, existing approaches naturally address this problem based on \textit{feature probability matching} (\textit{FPM}). However, for label shift adaptation where the differences in data distribution stem solely from variations in class probability, current methods still use FPM on the $d$-dimensional feature space to estimate the class probability ratio on the one-dimensional label space. To address label shift adaptation more naturally and effectively, inspired by a new representation of the source domain's class probability, we propose a new framework called \textit{class probability matching} (\textit{CPM}) which matches two class probability functions on the one-dimensional label space to estimate the class probability ratio, fundamentally different from FPM operating on the $d$-dimensional feature space. Furthermore, by incorporating the kernel logistic regression into the CPM framework to estimate the conditional probability, we propose an algorithm called \textit{class probability matching using kernel methods} (\textit{CPMKM}) for label shift adaptation. From the theoretical perspective, we establish the optimal convergence rates of CPMKM with respect to the cross-entropy loss for multi-class label shift adaptation. From the experimental perspective, comparisons on real datasets demonstrate that CPMKM outperforms existing FPM-based and maximum-likelihood-based algorithms. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.01046 [pdf, other]

Bagged Regularized $k$-Distances for Anomaly Detection

Authors: Yuchao Cai, Yuheng Ma, Hanfang Yang, Hanyuan Hang

Abstract: We consider the paradigm of unsupervised anomaly detection, which involves the identification of anomalies within a dataset in the absence of labeled examples. Though distance-based methods are top-performing for unsupervised anomaly detection, they suffer heavily from the sensitivity to the choice of the number of the nearest neighbors. In this paper, we propose a new distance-based algorithm cal… ▽ More We consider the paradigm of unsupervised anomaly detection, which involves the identification of anomalies within a dataset in the absence of labeled examples. Though distance-based methods are top-performing for unsupervised anomaly detection, they suffer heavily from the sensitivity to the choice of the number of the nearest neighbors. In this paper, we propose a new distance-based algorithm called bagged regularized $k$-distances for anomaly detection (BRDAD) converting the unsupervised anomaly detection problem into a convex optimization problem. Our BRDAD algorithm selects the weights by minimizing the surrogate risk, i.e., the finite sample bound of the empirical risk of the bagged weighted $k$-distances for density estimation (BWDDE). This approach enables us to successfully address the sensitivity challenge of the hyperparameter choice in distance-based algorithms. Moreover, when dealing with large-scale datasets, the efficiency issues can be addressed by the incorporated bagging technique in our BRDAD algorithm. On the theoretical side, we establish fast convergence rates of the AUC regret of our algorithm and demonstrate that the bagging technique significantly reduces the computational complexity. On the practical side, we conduct numerical experiments on anomaly detection benchmarks to illustrate the insensitivity of parameter selection of our algorithm compared with other state-of-the-art distance-based methods. Moreover, promising improvements are brought by applying the bagging technique in our algorithm on real-world datasets. △ Less

Submitted 13 February, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

arXiv:2304.02690 [pdf, other]

Hierarchical B-frame Video Coding Using Two-Layer CANF without Motion Coding

Authors: David Alexandre, Hsueh-Ming Hang, Wen-Hsiao Peng

Abstract: Typical video compression systems consist of two main modules: motion coding and residual coding. This general architecture is adopted by classical coding schemes (such as international standards H.265 and H.266) and deep learning-based coding schemes. We propose a novel B-frame coding architecture based on two-layer Conditional Augmented Normalization Flows (CANF). It has the striking feature of… ▽ More Typical video compression systems consist of two main modules: motion coding and residual coding. This general architecture is adopted by classical coding schemes (such as international standards H.265 and H.266) and deep learning-based coding schemes. We propose a novel B-frame coding architecture based on two-layer Conditional Augmented Normalization Flows (CANF). It has the striking feature of not transmitting any motion information. Our proposed idea of video compression without motion coding offers a new direction for learned video coding. Our base layer is a low-resolution image compressor that replaces the full-resolution motion compressor. The low-resolution coded image is merged with the warped high-resolution images to generate a high-quality image as a conditioning signal for the enhancement-layer image coding in full resolution. One advantage of this architecture is significantly reduced computational complexity due to eliminating the motion information compressor. In addition, we adopt a skip-mode coding technique to reduce the transmitted latent samples. The rate-distortion performance of our scheme is slightly lower than that of the state-of-the-art learned B-frame coding scheme, B-CANF, but outperforms other learned B-frame coding schemes. However, compared to B-CANF, our scheme saves 45% of multiply-accumulate operations (MACs) for encoding and 27% of MACs for decoding. The code is available at https://nycu-clab.github.io. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2212.14187 [pdf, other]

Learned Hierarchical B-frame Coding with Adaptive Feature Modulation for YUV 4:2:0 Content

Authors: Mu-Jung Chen, Hong-Sheng Xie, Cheng Chien, Wen-Hsiao Peng, Hsueh-Ming Hang

Abstract: This paper introduces a learned hierarchical B-frame coding scheme in response to the Grand Challenge on Neural Network-based Video Coding at ISCAS 2023. We address specifically three issues, including (1) B-frame coding, (2) YUV 4:2:0 coding, and (3) content-adaptive variable-rate coding with only one single model. Most learned video codecs operate internally in the RGB domain for P-frame coding.… ▽ More This paper introduces a learned hierarchical B-frame coding scheme in response to the Grand Challenge on Neural Network-based Video Coding at ISCAS 2023. We address specifically three issues, including (1) B-frame coding, (2) YUV 4:2:0 coding, and (3) content-adaptive variable-rate coding with only one single model. Most learned video codecs operate internally in the RGB domain for P-frame coding. B-frame coding for YUV 4:2:0 content is largely under-explored. In addition, while there have been prior works on variable-rate coding with conditional convolution, most of them fail to consider the content information. We build our scheme on conditional augmented normalized flows (CANF). It features conditional motion and inter-frame codecs for efficient B-frame coding. To cope with YUV 4:2:0 content, two conditional inter-frame codecs are used to process the Y and UV components separately, with the coding of the UV components conditioned additionally on the Y component. Moreover, we introduce adaptive feature modulation in every convolutional layer, taking into account both the content information and the coding levels of B-frames to achieve content-adaptive variable-rate coding. Experimental results show that our model outperforms x265 and the winner of last year's challenge on commonly used datasets in terms of PSNR-YUV. △ Less

Submitted 29 December, 2022; originally announced December 2022.

arXiv:2210.09786 [pdf, other]

Bagged $k$-Distance for Mode-Based Clustering Using the Probability of Localized Level Sets

Authors: Hanyuan Hang

Abstract: In this paper, we propose an ensemble learning algorithm named \textit{bagged $k$-distance for mode-based clustering} (\textit{BDMBC}) by putting forward a new measurement called the \textit{probability of localized level sets} (\textit{PLLS}), which enables us to find all clusters for varying densities with a global threshold. On the theoretical side, we show that with a properly chosen number of… ▽ More In this paper, we propose an ensemble learning algorithm named \textit{bagged $k$-distance for mode-based clustering} (\textit{BDMBC}) by putting forward a new measurement called the \textit{probability of localized level sets} (\textit{PLLS}), which enables us to find all clusters for varying densities with a global threshold. On the theoretical side, we show that with a properly chosen number of nearest neighbors $k_D$ in the bagged $k$-distance, the sub-sample size $s$, the bagging rounds $B$, and the number of nearest neighbors $k_L$ for the localized level sets, BDMBC can achieve optimal convergence rates for mode estimation. It turns out that with a relatively small $B$, the sub-sample size $s$ can be much smaller than the number of training data $n$ at each bagging round, and the number of nearest neighbors $k_D$ can be reduced simultaneously. Moreover, we establish optimal convergence results for the level set estimation of the PLLS in terms of Hausdorff distance, which reveals that BDMBC can find localized level sets for varying densities and thus enjoys local adaptivity. On the practical side, we conduct numerical experiments to empirically verify the effectiveness of BDMBC for mode estimation and level set estimation, which demonstrates the promising accuracy and efficiency of our proposed algorithm. △ Less

Submitted 18 October, 2022; originally announced October 2022.

arXiv:2210.08225 [pdf, other]

Learned Video Compression for YUV 4:2:0 Content Using Flow-based Conditional Inter-frame Coding

Authors: Yung-Han Ho, Chih-Hsuan Lin, Peng-Yu Chen, Mu-Jung Chen, Chih-Peng Chang, Wen-Hsiao Peng, Hsueh-Ming Hang

Abstract: This paper proposes a learning-based video compression framework for variable-rate coding on YUV 4:2:0 content. Most existing learning-based video compression models adopt the traditional hybrid-based coding architecture, which involves temporal prediction followed by residual coding. However, recent studies have shown that residual coding is sub-optimal from the information-theoretic perspective.… ▽ More This paper proposes a learning-based video compression framework for variable-rate coding on YUV 4:2:0 content. Most existing learning-based video compression models adopt the traditional hybrid-based coding architecture, which involves temporal prediction followed by residual coding. However, recent studies have shown that residual coding is sub-optimal from the information-theoretic perspective. In addition, most existing models are optimized with respect to RGB content. Furthermore, they require separate models for variable-rate coding. To address these issues, this work presents an attempt to incorporate the conditional inter-frame coding for YUV 4:2:0 content. We introduce a conditional flow-based inter-frame coder to improve the inter-frame coding efficiency. To adapt our codec to YUV 4:2:0 content, we adopt a simple strategy of using space-to-depth and depth-to-space conversions. Lastly, we employ a rate-adaption net to achieve variable-rate coding without training multiple models. Experimental results show that our model performs better than x265 on UVG and MCL-JCV datasets in terms of PSNR-YUV. However, on the more challenging datasets from ISCAS'22 GC, there is still ample room for improvement. This insufficient performance is due to the lack of inter-frame coding capability at a large GOP size and can be mitigated by increasing the model capacity and applying an error propagation-aware training strategy. △ Less

Submitted 15 October, 2022; originally announced October 2022.

Comments: Accepted by ISCAS 2022

arXiv:2207.01183 [pdf]

Fast Vehicle Detection and Tracking on Fisheye Traffic Monitoring Video using CNN and Bounding Box Propagation

Authors: Sandy Ardianto, Hsueh-Ming Hang, Wen-Huang Cheng

Abstract: We design a fast car detection and tracking algorithm for traffic monitoring fisheye video mounted on crossroads. We use ICIP 2020 VIP Cup dataset and adopt YOLOv5 as the object detection base model. The nighttime video of this dataset is very challenging, and the detection accuracy (AP50) of the base model is about 54%. We design a reliable car detection and tracking algorithm based on the concep… ▽ More We design a fast car detection and tracking algorithm for traffic monitoring fisheye video mounted on crossroads. We use ICIP 2020 VIP Cup dataset and adopt YOLOv5 as the object detection base model. The nighttime video of this dataset is very challenging, and the detection accuracy (AP50) of the base model is about 54%. We design a reliable car detection and tracking algorithm based on the concept of bounding box propagation among frames, which provides 17.9 percentage points (pp) and 6.2 pp. accuracy improvement over the base model for the nighttime and daytime videos, respectively. To speed up, the grayscale frame difference is used for the intermediate frames in a segment, which can double the processing speed. △ Less

Submitted 13 July, 2022; v1 submitted 3 July, 2022; originally announced July 2022.

Comments: to be published in International Conference on Image Processing (ICIP) 2022, Bordeaux, France

arXiv:2112.02589 [pdf, other]

Local Adaptivity of Gradient Boosting in Histogram Transform Ensemble Learning

Authors: Hanyuan Hang

Abstract: In this paper, we propose a gradient boosting algorithm called \textit{adaptive boosting histogram transform} (\textit{ABHT}) for regression to illustrate the local adaptivity of gradient boosting algorithms in histogram transform ensemble learning. From the theoretical perspective, when the target function lies in a locally Hölder continuous space, we show that our ABHT can filter out the regions… ▽ More In this paper, we propose a gradient boosting algorithm called \textit{adaptive boosting histogram transform} (\textit{ABHT}) for regression to illustrate the local adaptivity of gradient boosting algorithms in histogram transform ensemble learning. From the theoretical perspective, when the target function lies in a locally Hölder continuous space, we show that our ABHT can filter out the regions with different orders of smoothness. Consequently, we are able to prove that the upper bound of the convergence rates of ABHT is strictly smaller than the lower bound of \textit{parallel ensemble histogram transform} (\textit{PEHT}). In the experiments, both synthetic and real-world data experiments empirically validate the theoretical results, which demonstrates the advantageous performance and local adaptivity of our ABHT. △ Less

Submitted 5 December, 2021; originally announced December 2021.

arXiv:2112.02450 [pdf, other]

Adaptive Feature Interpolation for Low-Shot Image Generation

Authors: Mengyu Dai, Haibin Hang, Xiaoyang Guo

Abstract: Training of generative models especially Generative Adversarial Networks can easily diverge in low-data setting. To mitigate this issue, we propose a novel implicit data augmentation approach which facilitates stable training and synthesize high-quality samples without need of label information. Specifically, we view the discriminator as a metric embedding of the real data manifold, which offers p… ▽ More Training of generative models especially Generative Adversarial Networks can easily diverge in low-data setting. To mitigate this issue, we propose a novel implicit data augmentation approach which facilitates stable training and synthesize high-quality samples without need of label information. Specifically, we view the discriminator as a metric embedding of the real data manifold, which offers proper distances between real data points. We then utilize information in the feature space to develop a fully unsupervised and data-driven augmentation method. Experiments on few-shot generation tasks show the proposed method significantly improve results from strong baselines with hundreds of training samples. △ Less

Submitted 14 July, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

Comments: ECCV'22. Code available at https://github.com/dzld00/Adaptive-Feature-Interpolation-for-Low-Shot-Image-Generation

arXiv:2109.03378 [pdf, other]

Rethinking Multidimensional Discriminator Output for Generative Adversarial Networks

Authors: Mengyu Dai, Haibin Hang, Anuj Srivastava

Abstract: The study of multidimensional discriminator (critic) output for Generative Adversarial Networks has been underexplored in the literature. In this paper, we generalize the Wasserstein GAN framework to take advantage of multidimensional critic output and explore its properties. We also introduce a square-root velocity transformation (SRVT) block which favors training in the multidimensional setting.… ▽ More The study of multidimensional discriminator (critic) output for Generative Adversarial Networks has been underexplored in the literature. In this paper, we generalize the Wasserstein GAN framework to take advantage of multidimensional critic output and explore its properties. We also introduce a square-root velocity transformation (SRVT) block which favors training in the multidimensional setting. Proofs of properties are based on our proposed maximal p-centrality discrepancy, which is bounded above by p-Wasserstein distance and fits the Wasserstein GAN framework with multidimensional critic output n. Especially when n = 1 and p = 1, the proposed discrepancy equals 1-Wasserstein distance. Theoretical analysis and empirical evidence show that high-dimensional critic output has its advantage on distinguishing real and fake distributions, and benefits faster convergence and diversity of results. △ Less

Submitted 14 July, 2022; v1 submitted 7 September, 2021; originally announced September 2021.

Comments: Frontiers in Adversarial Machine Learning ICML 2022

arXiv:2109.00531 [pdf, other]

Under-bagging Nearest Neighbors for Imbalanced Classification

Authors: Hanyuan Hang, Yuchao Cai, Hanfang Yang, Zhouchen Lin

Abstract: In this paper, we propose an ensemble learning algorithm called \textit{under-bagging $k$-nearest neighbors} (\textit{under-bagging $k$-NN}) for imbalanced classification problems. On the theoretical side, by develo** a new learning theory analysis, we show that with properly chosen parameters, i.e., the number of nearest neighbors $k$, the expected sub-sample size $s$, and the bagging rounds… ▽ More In this paper, we propose an ensemble learning algorithm called \textit{under-bagging $k$-nearest neighbors} (\textit{under-bagging $k$-NN}) for imbalanced classification problems. On the theoretical side, by develo** a new learning theory analysis, we show that with properly chosen parameters, i.e., the number of nearest neighbors $k$, the expected sub-sample size $s$, and the bagging rounds $B$, optimal convergence rates for under-bagging $k$-NN can be achieved under mild assumptions w.r.t.~the arithmetic mean (AM) of recalls. Moreover, we show that with a relatively small $B$, the expected sub-sample size $s$ can be much smaller than the number of training data $n$ at each bagging round, and the number of nearest neighbors $k$ can be reduced simultaneously, especially when the data are highly imbalanced, which leads to substantially lower time complexity and roughly the same space complexity. On the practical side, we conduct numerical experiments to verify the theoretical results on the benefits of the under-bagging technique by the promising AM performance and efficiency of our proposed algorithm. △ Less

Submitted 1 September, 2021; originally announced September 2021.

arXiv:2108.08831 [pdf, ps, other]

U-match factorization: sparse homological algebra, lazy cycle representatives, and dualities in persistent (co)homology

Authors: Haibin Hang, Chad Giusti, Lori Ziegelmeier, Gregory Henselman-Petrusek

Abstract: Persistent homology is a leading tool in topological data analysis (TDA). Many problems in TDA can be solved via homological -- and indeed, linear -- algebra. However, matrices in this domain are typically large, with rows and columns numbered in billions. Low-rank approximation of such arrays typically destroys essential information; thus, new mathematical and computational paradigms are needed f… ▽ More Persistent homology is a leading tool in topological data analysis (TDA). Many problems in TDA can be solved via homological -- and indeed, linear -- algebra. However, matrices in this domain are typically large, with rows and columns numbered in billions. Low-rank approximation of such arrays typically destroys essential information; thus, new mathematical and computational paradigms are needed for very large, sparse matrices. We present the U-match matrix factorization scheme to address this challenge. U-match has two desirable features. First, it admits a compressed storage format that reduces the number of nonzero entries held in computer memory by one or more orders of magnitude over other common factorizations. Second, it permits direct solution of diverse problems in linear and homological algebra, without decompressing matrices stored in memory. These problems include look-up and retrieval of rows and columns; evaluation of birth/death times, and extraction of generators in persistent (co)homology; and, calculation of bases for boundary and cycle subspaces of filtered chain complexes. Such bases are key to unlocking a range of other topological techniques for use in TDA, and U-match factorization is designed to make such calculations broadly accessible to practitioners. As an application, we show that individual cycle representatives in persistent homology can be retrieved at time and memory costs orders of magnitude below current state of the art, via global duality. Moreover, the algebraic machinery needed to achieve this computation already exists in many modern solvers. △ Less

Submitted 20 August, 2021; v1 submitted 19 August, 2021; originally announced August 2021.

Comments: 50 pages, 4 appendices; updated to fix typographical and metadata errors

MSC Class: 55N31 (Primary); 15A23; 65F50; 65F05 (Secondary)

arXiv:2107.08470 [pdf, other]

ANFIC: Image Compression Using Augmented Normalizing Flows

Authors: Yung-Han Ho, Chih-Chun Chan, Wen-Hsiao Peng, Hsueh-Ming Hang, Marek Domanski

Abstract: This paper introduces an end-to-end learned image compression system, termed ANFIC, based on Augmented Normalizing Flows (ANF). ANF is a new type of flow model, which stacks multiple variational autoencoders (VAE) for greater model expressiveness. The VAE-based image compression has gone mainstream, showing promising compression performance. Our work presents the first attempt to leverage VAE-base… ▽ More This paper introduces an end-to-end learned image compression system, termed ANFIC, based on Augmented Normalizing Flows (ANF). ANF is a new type of flow model, which stacks multiple variational autoencoders (VAE) for greater model expressiveness. The VAE-based image compression has gone mainstream, showing promising compression performance. Our work presents the first attempt to leverage VAE-based compression in a flow-based framework. ANFIC advances further compression efficiency by stacking and extending hierarchically multiple VAE's. The invertibility of ANF, together with our training strategies, enables ANFIC to support a wide range of quality levels without changing the encoding and decoding networks. Extensive experimental results show that in terms of PSNR-RGB, ANFIC performs comparably to or better than the state-of-the-art learned image compression. Moreover, it performs close to VVC intra coding, from low-rate compression up to nearly-lossless compression. In particular, ANFIC achieves the state-of-the-art performance, when extended with conditional convolution for variable rate compression with a single model. △ Less

Submitted 25 October, 2021; v1 submitted 18 July, 2021; originally announced July 2021.

arXiv:2106.10777 [pdf, other]

Manifold Matching via Deep Metric Learning for Generative Modeling

Authors: Mengyu Dai, Haibin Hang

Abstract: We propose a manifold matching approach to generative models which includes a distribution generator (or data generator) and a metric generator. In our framework, we view the real data set as some manifold embedded in a high-dimensional Euclidean space. The distribution generator aims at generating samples that follow some distribution condensed around the real data manifold. It is achieved by mat… ▽ More We propose a manifold matching approach to generative models which includes a distribution generator (or data generator) and a metric generator. In our framework, we view the real data set as some manifold embedded in a high-dimensional Euclidean space. The distribution generator aims at generating samples that follow some distribution condensed around the real data manifold. It is achieved by matching two sets of points using their geometric shape descriptors, such as centroid and $p$-diameter, with learned distance metric; the metric generator utilizes both real data and generated samples to learn a distance metric which is close to some intrinsic geodesic distance on the real data manifold. The produced distance metric is further used for manifold matching. The two networks are learned simultaneously during the training process. We apply the approach on both unsupervised and supervised learning tasks: in unconditional image generation task, the proposed method obtains competitive results compared with existing generative models; in super-resolution task, we incorporate the framework in perception-based models and improve visual qualities by producing samples with more natural textures. Experiments and analysis demonstrate the feasibility and effectiveness of the proposed framework. △ Less

Submitted 26 August, 2021; v1 submitted 20 June, 2021; originally announced June 2021.

Comments: ICCV 2021. Code available at https://github.com/dzld00/pytorch-manifold-matching.git

arXiv:2106.05738 [pdf, other]

GBHT: Gradient Boosting Histogram Transform for Density Estimation

Authors: **gyi Cui, Hanyuan Hang, Yisen Wang, Zhouchen Lin

Abstract: In this paper, we propose a density estimation algorithm called \textit{Gradient Boosting Histogram Transform} (GBHT), where we adopt the \textit{Negative Log Likelihood} as the loss function to make the boosting procedure available for the unsupervised tasks. From a learning theory viewpoint, we first prove fast convergence rates for GBHT with the smoothness assumption that the underlying density… ▽ More In this paper, we propose a density estimation algorithm called \textit{Gradient Boosting Histogram Transform} (GBHT), where we adopt the \textit{Negative Log Likelihood} as the loss function to make the boosting procedure available for the unsupervised tasks. From a learning theory viewpoint, we first prove fast convergence rates for GBHT with the smoothness assumption that the underlying density function lies in the space $C^{0,α}$. Then when the target density function lies in spaces $C^{1,α}$, we present an upper bound for GBHT which is smaller than the lower bound of its corresponding base learner, in the sense of convergence rates. To the best of our knowledge, we make the first attempt to theoretically explain why boosting can enhance the performance of its base learners for density estimation problems. In experiments, we not only conduct performance comparisons with the widely used KDE, but also apply GBHT to anomaly detection to showcase a further application of GBHT. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: Accepted to ICML2021. arXiv admin note: text overlap with arXiv:2106.01986

arXiv:2106.05731 [pdf, other]

Leveraged Weighted Loss for Partial Label Learning

Authors: Hongwei Wen, **gyi Cui, Hanyuan Hang, Jiabin Liu, Yisen Wang, Zhouchen Lin

Abstract: As an important branch of weakly supervised learning, partial label learning deals with data where each instance is assigned with a set of candidate labels, whereas only one of them is true. Despite many methodology studies on learning from partial labels, there still lacks theoretical understandings of their risk consistent properties under relatively weak assumptions, especially on the link betw… ▽ More As an important branch of weakly supervised learning, partial label learning deals with data where each instance is assigned with a set of candidate labels, whereas only one of them is true. Despite many methodology studies on learning from partial labels, there still lacks theoretical understandings of their risk consistent properties under relatively weak assumptions, especially on the link between theoretical results and the empirical choice of parameters. In this paper, we propose a family of loss functions named \textit{Leveraged Weighted} (LW) loss, which for the first time introduces the leverage parameter $β$ to consider the trade-off between losses on partial labels and non-partial ones. From the theoretical side, we derive a generalized result of risk consistency for the LW loss in learning from partial labels, based on which we provide guidance to the choice of the leverage parameter $β$. In experiments, we verify the theoretical guidance, and show the high effectiveness of our proposed LW loss on both benchmark and real datasets compared with other state-of-the-art partial label learning algorithms. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: Accepted to ICML2021 as long talk

arXiv:2106.01986 [pdf, other]

Gradient Boosted Binary Histogram Ensemble for Large-scale Regression

Authors: Hanyuan Hang, Tao Huang, Yuchao Cai, Hanfang Yang, Zhouchen Lin

Abstract: In this paper, we propose a gradient boosting algorithm for large-scale regression problems called \textit{Gradient Boosted Binary Histogram Ensemble} (GBBHE) based on binary histogram partition and ensemble learning. From the theoretical perspective, by assuming the Hölder continuity of the target function, we establish the statistical convergence rate of GBBHE in the space $C^{0,α}$ and… ▽ More In this paper, we propose a gradient boosting algorithm for large-scale regression problems called \textit{Gradient Boosted Binary Histogram Ensemble} (GBBHE) based on binary histogram partition and ensemble learning. From the theoretical perspective, by assuming the Hölder continuity of the target function, we establish the statistical convergence rate of GBBHE in the space $C^{0,α}$ and $C^{1,0}$, where a lower bound of the convergence rate for the base learner demonstrates the advantage of boosting. Moreover, in the space $C^{1,0}$, we prove that the number of iterations to achieve the fast convergence rate can be reduced by using ensemble regressor as the base learner, which improves the computational efficiency. In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), Breiman's forest, and kernel-based methods, our GBBHE algorithm shows promising performance with less running time on large-scale datasets. △ Less

Submitted 3 June, 2021; originally announced June 2021.

arXiv:2012.08392 [pdf, other]

FINED: Fast Inference Network for Edge Detection

Authors: Jan Kristanto Wibisono, Hsueh-Ming Hang

Abstract: In this paper, we address the design of lightweight deep learning-based edge detection. The deep learning technology offers a significant improvement on the edge detection accuracy. However, typical neural network designs have very high model complexity, which prevents it from practical usage. In contrast, we propose a Fast Inference Network for Edge Detection (FINED), which is a lightweight neura… ▽ More In this paper, we address the design of lightweight deep learning-based edge detection. The deep learning technology offers a significant improvement on the edge detection accuracy. However, typical neural network designs have very high model complexity, which prevents it from practical usage. In contrast, we propose a Fast Inference Network for Edge Detection (FINED), which is a lightweight neural net dedicated to edge detection. By carefully choosing proper components for edge detection purpose, we can achieve the state-of-the-art accuracy in edge detection while significantly reducing its complexity. Another key contribution in increasing the inferencing speed is introducing the training helper concept. The extra subnetworks (training helper) are employed in training but not used in inferencing. It can further reduce the model complexity and yet maintain the same level of accuracy. Our experiments show that our systems outperform all the current edge detectors at about the same model (parameter) size. △ Less

Submitted 15 December, 2020; originally announced December 2020.

Comments: Submitted to ICME 2021

arXiv:2012.07462 [pdf, other]

Learned Video Codec with Enriched Reconstruction for CLIC P-frame Coding

Authors: David Alexandre, Hsueh-Ming Hang

Abstract: This paper proposes a learning-based video codec, specifically used for Challenge on Learned Image Compression (CLIC, CVPRWorkshop) 2020 P-frame coding. More specifically, we designed a compressor network with Refine-Net for coding residual signals and motion vectors. Also, for motion estimation, we introduced a hierarchical, attention-based ME-Net. To verify our design, we conducted an extensive… ▽ More This paper proposes a learning-based video codec, specifically used for Challenge on Learned Image Compression (CLIC, CVPRWorkshop) 2020 P-frame coding. More specifically, we designed a compressor network with Refine-Net for coding residual signals and motion vectors. Also, for motion estimation, we introduced a hierarchical, attention-based ME-Net. To verify our design, we conducted an extensive ablation study on our modules and different input formats. Our video codec demonstrates its performance by using the perfect reference frame at the decoder side specified by the CLIC P-frame Challenge. The experimental result shows that our proposed codec is very competitive with the Challenge top performers in terms of quality metrics. △ Less

Submitted 14 December, 2020; originally announced December 2020.

arXiv:2005.13862 [pdf, other]

Traditional Method Inspired Deep Neural Network for Edge Detection

Authors: Jan Kristanto Wibisono, Hsueh-Ming Hang

Abstract: Recently, Deep-Neural-Network (DNN) based edge prediction is progressing fast. Although the DNN based schemes outperform the traditional edge detectors, they have much higher computational complexity. It could be that the DNN based edge detectors often adopt the neural net structures designed for high-level computer vision tasks, such as image segmentation and object recognition. Edge detection is… ▽ More Recently, Deep-Neural-Network (DNN) based edge prediction is progressing fast. Although the DNN based schemes outperform the traditional edge detectors, they have much higher computational complexity. It could be that the DNN based edge detectors often adopt the neural net structures designed for high-level computer vision tasks, such as image segmentation and object recognition. Edge detection is a rather local and simple job, the over-complicated architecture and massive parameters may be unnecessary. Therefore, we propose a traditional method inspired framework to produce good edges with minimal complexity. We simplify the network architecture to include Feature Extractor, Enrichment, and Summarizer, which roughly correspond to gradient, low pass filter, and pixel connection in the traditional edge detection schemes. The proposed structure can effectively reduce the complexity and retain the edge prediction quality. Our TIN2 (Traditional Inspired Network) model has an accuracy higher than the recent BDCN2 (Bi-Directional Cascade Network) but with a smaller model. △ Less

Submitted 28 May, 2020; originally announced May 2020.

arXiv:1912.04738 [pdf, other]

Histogram Transform Ensembles for Large-scale Regression

Authors: Hanyuan Hang, Zhouchen Lin, Xiaoyu Liu, Hongwei Wen

Abstract: We propose a novel algorithm for large-scale regression problems named histogram transform ensembles (HTE), composed of random rotations, stretchings, and translations. First of all, we investigate the theoretical properties of HTE when the regression function lies in the Hölder space $C^{k,α}$, $k \in \mathbb{N}_0$, $α\in (0,1]$. In the case that $k=0, 1$, we adopt the constant regressors and dev… ▽ More We propose a novel algorithm for large-scale regression problems named histogram transform ensembles (HTE), composed of random rotations, stretchings, and translations. First of all, we investigate the theoretical properties of HTE when the regression function lies in the Hölder space $C^{k,α}$, $k \in \mathbb{N}_0$, $α\in (0,1]$. In the case that $k=0, 1$, we adopt the constant regressors and develop the naïve histogram transforms (NHT). Within the space $C^{0,α}$, although almost optimal convergence rates can be derived for both single and ensemble NHT, we fail to show the benefits of ensembles over single estimators theoretically. In contrast, in the subspace $C^{1,α}$, we prove that if $d \geq 2(1+α)/α$, the lower bound of the convergence rates for single NHT turns out to be worse than the upper bound of the convergence rates for ensemble NHT. In the other case when $k \geq 2$, the NHT may no longer be appropriate in predicting smoother regression functions. Instead, we apply kernel histogram transforms (KHT) equipped with smoother regressors such as support vector machines (SVMs), and it turns out that both single and ensemble KHT enjoy almost optimal convergence rates. Then we validate the above theoretical results by numerical experiments. On the one hand, simulations are conducted to elucidate that ensemble NHT outperform single NHT. On the other hand, the effects of bin sizes on accuracy of both NHT and KHT also accord with theoretical analysis. Last but not least, in the real-data experiments, comparisons between the ensemble KHT, equipped with adaptive histogram transforms, and other state-of-the-art large-scale regression estimators verify the effectiveness and accuracy of our algorithm. △ Less

Submitted 8 December, 2019; originally announced December 2019.

Comments: arXiv admin note: text overlap with arXiv:1911.11581

arXiv:1911.11581 [pdf, other]

Histogram Transform Ensembles for Density Estimation

Authors: Hanyuan Hang

Abstract: We investigate an algorithm named histogram transform ensembles (HTE) density estimator whose effectiveness is supported by both solid theoretical analysis and significant experimental performance. On the theoretical side, by decomposing the error term into approximation error and estimation error, we are able to conduct the following analysis: First of all, we establish the universal consistency… ▽ More We investigate an algorithm named histogram transform ensembles (HTE) density estimator whose effectiveness is supported by both solid theoretical analysis and significant experimental performance. On the theoretical side, by decomposing the error term into approximation error and estimation error, we are able to conduct the following analysis: First of all, we establish the universal consistency under $L_1(μ)$-norm. Secondly, under the assumption that the underlying density function resides in the Hölder space $C^{0,α}$, we prove almost optimal convergence rates for both single and ensemble density estimators under $L_1(μ)$-norm and $L_{\infty}(μ)$-norm for different tail distributions, whereas in contrast, for its subspace $C^{1,α}$ consisting of smoother functions, almost optimal convergence rates can only be established for the ensembles and the lower bound of the single estimators illustrates the benefits of ensembles over single density estimators. In the experiments, we first carry out simulations to illustrate that histogram transform ensembles surpass single histogram transforms, which offers powerful evidence to support the theoretical results in the space $C^{1,α}$. Moreover, to further exert the experimental performances, we propose an adaptive version of HTE and study the parameters by generating several synthetic datasets with diversities in dimensions and distributions. Last but not least, real data experiments with other state-of-the-art density estimators demonstrate the accuracy of the adaptive HTE algorithm. △ Less

Submitted 24 November, 2019; originally announced November 2019.

Comments: arXiv admin note: text overlap with arXiv:1905.03729

arXiv:1907.10015 [pdf]

Exploring Semantic Segmentation on the DCT Representation

Authors: Shao-Yuan Lo, Hsueh-Ming Hang

Abstract: Typical convolutional networks are trained and conducted on RGB images. However, images are often compressed for memory savings and efficient transmission in real-world applications. In this paper, we explore methods for performing semantic segmentation on the discrete cosine transform (DCT) representation defined by the JPEG standard. We first rearrange the DCT coefficients to form a preferred in… ▽ More Typical convolutional networks are trained and conducted on RGB images. However, images are often compressed for memory savings and efficient transmission in real-world applications. In this paper, we explore methods for performing semantic segmentation on the discrete cosine transform (DCT) representation defined by the JPEG standard. We first rearrange the DCT coefficients to form a preferred input type, then we tailor an existing network to the DCT inputs. The proposed method has an accuracy close to the RGB model at about the same network complexity. Moreover, we investigate the impact of selecting different DCT components on segmentation performance. With a proper selection, one can achieve the same level accuracy using only 36% of the DCT coefficients. We further show the robustness of our method under the quantization errors. To our knowledge, this paper is the first to explore semantic segmentation on the DCT representation. △ Less

Submitted 29 December, 2019; v1 submitted 23 July, 2019; originally announced July 2019.

Comments: Accepted in ACM International Conference on Multimedia in Asia (MMAsia) 2019

arXiv:1907.09438 [pdf]

Multi-Class Lane Semantic Segmentation using Efficient Convolutional Networks

Authors: Shao-Yuan Lo, Hsueh-Ming Hang, Sheng-Wei Chan, **g-Jhih Lin

Abstract: Lane detection plays an important role in a self-driving vehicle. Several studies leverage a semantic segmentation network to extract robust lane features, but few of them can distinguish different types of lanes. In this paper, we focus on the problem of multi-class lane semantic segmentation. Based on the observation that the lane is a small-size and narrow-width object in a road scene image, we… ▽ More Lane detection plays an important role in a self-driving vehicle. Several studies leverage a semantic segmentation network to extract robust lane features, but few of them can distinguish different types of lanes. In this paper, we focus on the problem of multi-class lane semantic segmentation. Based on the observation that the lane is a small-size and narrow-width object in a road scene image, we propose two techniques, Feature Size Selection (FSS) and Degressive Dilation Block (DD Block). The FSS allows a network to extract thin lane features using appropriate feature sizes. To acquire fine-grained spatial information, the DD Block is made of a series of dilated convolutions with degressive dilation rates. Experimental results show that the proposed techniques provide obvious improvement in accuracy, while they achieve the same or faster inference speed compared to the baseline system, and can run at real-time on high-resolution images. △ Less

Submitted 22 July, 2019; originally announced July 2019.

Comments: Accepted in IEEE International Workshop on Multimedia Signal Processing (MMSP) 2019

arXiv:1906.10094 [pdf, other]

Density-based Clustering with Best-scored Random Forest

Authors: Hanyuan Hang, Yuchao Cai, Hanfang Yang

Abstract: Single-level density-based approach has long been widely acknowledged to be a conceptually and mathematically convincing clustering method. In this paper, we propose an algorithm called "best-scored clustering forest" that can obtain the optimal level and determine corresponding clusters. The terminology "best-scored" means to select one random tree with the best empirical performance out of a cer… ▽ More Single-level density-based approach has long been widely acknowledged to be a conceptually and mathematically convincing clustering method. In this paper, we propose an algorithm called "best-scored clustering forest" that can obtain the optimal level and determine corresponding clusters. The terminology "best-scored" means to select one random tree with the best empirical performance out of a certain number of purely random tree candidates. From the theoretical perspective, we first show that consistency of our proposed algorithm can be guaranteed. Moreover, under certain mild restrictions on the underlying density functions and target clusters, even fast convergence rates can be achieved. Last but not least, comparisons with other state-of-the-art clustering methods in the numerical experiments demonstrate accuracy of our algorithm on both synthetic data and several benchmark real data sets. △ Less

Submitted 24 June, 2019; originally announced June 2019.

arXiv:1905.11028 [pdf, other]

Best-scored Random Forest Classification

Authors: Hanyuan Hang, Xiaoyu Liu, Ingo Steinwart

Abstract: We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates as each single tree in the forest. In this way, the resulting forest can be more accurate than the original purely random forest. From the theoretical perspectiv… ▽ More We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates as each single tree in the forest. In this way, the resulting forest can be more accurate than the original purely random forest. From the theoretical perspective, within the framework of regularized empirical risk minimization penalized on the number of splits, we establish almost optimal convergence rates for the proposed best-scored random trees under certain conditions which can be extended to the best-scored random forest. In addition, we present a counterexample to illustrate that in order to ensure the consistency of the forest, every dimension must have the chance to be split. In the numerical experiments, for the sake of efficiency, we employ an adaptive random splitting criterion. Comparative experiments with other state-of-art classification methods demonstrate the accuracy of our best-scored random forest. △ Less

Submitted 27 May, 2019; originally announced May 2019.

arXiv:1905.03729 [pdf, other]

Best-scored Random Forest Density Estimation

Authors: Hanyuan Hang, Hongwei Wen

Abstract: This paper presents a brand new nonparametric density estimation strategy named the best-scored random forest density estimation whose effectiveness is supported by both solid theoretical analysis and significant experimental performance. The terminology best-scored stands for selecting one density tree with the best estimation performance out of a certain number of purely random density tree cand… ▽ More This paper presents a brand new nonparametric density estimation strategy named the best-scored random forest density estimation whose effectiveness is supported by both solid theoretical analysis and significant experimental performance. The terminology best-scored stands for selecting one density tree with the best estimation performance out of a certain number of purely random density tree candidates and we then name the selected one the best-scored random density tree. In this manner, the ensemble of these selected trees that is the best-scored random density forest can achieve even better estimation results than simply integrating trees without selection. From the theoretical perspective, by decomposing the error term into two, we are able to carry out the following analysis: First of all, we establish the consistency of the best-scored random density trees under $L_1$-norm. Secondly, we provide the convergence rates of them under $L_1$-norm concerning with three different tail assumptions, respectively. Thirdly, the convergence rates under $L_{\infty}$-norm is presented. Last but not least, we also achieve the above convergence rates analysis for the best-scored random density forest. When conducting comparative experiments with other state-of-the-art density estimation approaches on both synthetic and real data sets, it turns out that our algorithm has not only significant advantages in terms of estimation accuracy over other methods, but also stronger resistance to the curse of dimensionality. △ Less

Submitted 9 May, 2019; originally announced May 2019.

arXiv:1905.03438 [pdf, other]

Two-stage Best-scored Random Forest for Large-scale Regression

Authors: Hanyuan Hang, Yingyi Chen, Johan A. K. Suykens

Abstract: We propose a novel method designed for large-scale regression problems, namely the two-stage best-scored random forest (TBRF). "Best-scored" means to select one regression tree with the best empirical performance out of a certain number of purely random regression tree candidates, and "two-stage" means to divide the original random tree splitting procedure into two: In stage one, the feature space… ▽ More We propose a novel method designed for large-scale regression problems, namely the two-stage best-scored random forest (TBRF). "Best-scored" means to select one regression tree with the best empirical performance out of a certain number of purely random regression tree candidates, and "two-stage" means to divide the original random tree splitting procedure into two: In stage one, the feature space is partitioned into non-overlap** cells; in stage two, child trees grow separately on these cells. The strengths of this algorithm can be summarized as follows: First of all, the pure randomness in TBRF leads to the almost optimal learning rates, and also makes ensemble learning possible, which resolves the boundary discontinuities long plaguing the existing algorithms. Secondly, the two-stage procedure paves the way for parallel computing, leading to computational efficiency. Last but not least, TBRF can serve as an inclusive framework where different mainstream regression strategies such as linear predictor and least squares support vector machines (LS-SVMs) can also be incorporated as value assignment approaches on leaves of the child trees, depending on the characteristics of the underlying data sets. Numerical assessments on comparisons with other state-of-the-art methods on several large-scale real data sets validate the promising prediction accuracy and high computational efficiency of our algorithm. △ Less

Submitted 9 May, 2019; originally announced May 2019.

arXiv:1905.00190 [pdf, other]

Learned Image Compression with Soft Bit-based Rate-Distortion Optimization

Authors: David Alexandre, Chih-Peng Chang, Wen-Hsiao Peng, Hsueh-Ming Hang

Abstract: This paper introduces the notion of soft bits to address the rate-distortion optimization for learning-based image compression. Recent methods for such compression train an autoencoder end-to-end with an objective to strike a balance between distortion and rate. They are faced with the zero gradient issue due to quantization and the difficulty of estimating the rate accurately. Inspired by soft qu… ▽ More This paper introduces the notion of soft bits to address the rate-distortion optimization for learning-based image compression. Recent methods for such compression train an autoencoder end-to-end with an objective to strike a balance between distortion and rate. They are faced with the zero gradient issue due to quantization and the difficulty of estimating the rate accurately. Inspired by soft quantization, we represent quantization indices of feature maps with differentiable soft bits. This allows us to couple tightly the rate estimation with context-adaptive binary arithmetic coding. It also provides a differentiable distortion objective function. Experimental results show that our approach achieves the state-of-the-art compression performance among the learning-based schemes in terms of MS-SSIM and PSNR. △ Less

Submitted 1 May, 2019; originally announced May 2019.

arXiv:1904.05022 [pdf]

DSNet: An Efficient CNN for Road Scene Segmentation

Authors: **-Rong Chen, Hsueh-Ming Hang, Sheng-Wei Chan, **g-Jhih Lin

Abstract: Road scene understanding is a critical component in an autonomous driving system. Although the deep learning-based road scene segmentation can achieve very high accuracy, its complexity is also very high for develo** real-time applications. It is challenging to design a neural net with high accuracy and low computational complexity. To address this issue, we investigate the advantages and disadv… ▽ More Road scene understanding is a critical component in an autonomous driving system. Although the deep learning-based road scene segmentation can achieve very high accuracy, its complexity is also very high for develo** real-time applications. It is challenging to design a neural net with high accuracy and low computational complexity. To address this issue, we investigate the advantages and disadvantages of several popular CNN architectures in terms of speed, storage and segmentation accuracy. We start from the Fully Convolutional Network (FCN) with VGG, and then we study ResNet and DenseNet. Through detailed experiments, we pick up the favorable components from the existing architectures and at the end, we construct a light-weight network architecture based on the DenseNet. Our proposed network, called DSNet, demonstrates a real-time testing (inferencing) ability (on the popular GPU platform) and it maintains an accuracy comparable with most previous systems. We test our system on several datasets including the challenging Cityscapes dataset (resolution of 1024x512) with an mIoU of about 69.1 % and runtime of 0.0147 second per image on a single GTX 1080Ti. We also design a more accurate model but at the price of a slower speed, which has an mIoU of about 72.6 % on the CamVid dataset. △ Less

Submitted 10 April, 2019; originally announced April 2019.

arXiv:1902.07385 [pdf, other]

An Autoencoder-based Learned Image Compressor: Description of Challenge Proposal by NCTU

Authors: David Alexandre, Chih-Peng Chang, Wen-Hsiao Peng, Hsueh-Ming Hang

Abstract: We propose a lossy image compression system using the deep-learning autoencoder structure to participate in the Challenge on Learned Image Compression (CLIC) 2018. Our autoencoder uses the residual blocks with skip connections to reduce the correlation among image pixels and condense the input image into a set of feature maps, a compact representation of the original image. The bit allocation and… ▽ More We propose a lossy image compression system using the deep-learning autoencoder structure to participate in the Challenge on Learned Image Compression (CLIC) 2018. Our autoencoder uses the residual blocks with skip connections to reduce the correlation among image pixels and condense the input image into a set of feature maps, a compact representation of the original image. The bit allocation and bitrate control are implemented by using the importance maps and quantizer. The importance maps are generated by a separate neural net in the encoder. The autoencoder and the importance net are trained jointly based on minimizing a weighted sum of mean squared error, MS-SSIM, and a rate estimate. Our aim is to produce reconstructed images with good subjective quality subject to the 0.15 bits-per-pixel constraint. △ Less

Submitted 19 February, 2019; originally announced February 2019.

Comments: Published in CVPR 2018: Workshop And Challenge On Learned Image Compression

arXiv:1810.02321 [pdf, ps, other]

Optimal Learning with Anisotropic Gaussian SVMs

Authors: Hanyuan Hang, Ingo Steinwart

Abstract: This paper investigates the nonparametric regression problem using SVMs with anisotropic Gaussian RBF kernels. Under the assumption that the target functions are resided in certain anisotropic Besov spaces, we establish the almost optimal learning rates, more precisely, optimal up to some logarithmic factor, presented by the effective smoothness. By taking the effective smoothness into considerati… ▽ More This paper investigates the nonparametric regression problem using SVMs with anisotropic Gaussian RBF kernels. Under the assumption that the target functions are resided in certain anisotropic Besov spaces, we establish the almost optimal learning rates, more precisely, optimal up to some logarithmic factor, presented by the effective smoothness. By taking the effective smoothness into consideration, our almost optimal learning rates are faster than those obtained with the underlying RKHSs being certain anisotropic Sobolev spaces. Moreover, if the target function depends only on fewer dimensions, faster learning rates can be further achieved. △ Less

Submitted 4 October, 2018; originally announced October 2018.

arXiv:1809.09077 [pdf]

Incorporating Luminance, Depth and Color Information by a Fusion-based Network for Semantic Segmentation

Authors: Shang-Wei Hung, Shao-Yuan Lo, Hsueh-Ming Hang

Abstract: Semantic segmentation has made encouraging progress due to the success of deep convolutional networks in recent years. Meanwhile, depth sensors become prevalent nowadays, so depth maps can be acquired more easily. However, there are few studies that focus on the RGB-D semantic segmentation task. Exploiting the depth information effectiveness to improve performance is a challenge. In this paper, we… ▽ More Semantic segmentation has made encouraging progress due to the success of deep convolutional networks in recent years. Meanwhile, depth sensors become prevalent nowadays, so depth maps can be acquired more easily. However, there are few studies that focus on the RGB-D semantic segmentation task. Exploiting the depth information effectiveness to improve performance is a challenge. In this paper, we propose a novel solution named LDFNet, which incorporates Luminance, Depth and Color information by a fusion-based network. It includes a sub-network to process depth maps and employs luminance images to assist the depth information in processes. LDFNet outperforms the other state-of-art systems on the Cityscapes dataset, and its inference speed is faster than most of the existing networks. The experimental results show the effectiveness of the proposed multi-modal fusion network and its potential for practical applications. △ Less

Submitted 19 May, 2019; v1 submitted 24 September, 2018; originally announced September 2018.

Comments: Accepted in IEEE International Conference on Image Processing (ICIP) 2019

arXiv:1809.06323 [pdf]

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

Authors: Shao-Yuan Lo, Hsueh-Ming Hang, Sheng-Wei Chan, **g-Jhih Lin

Abstract: Real-time semantic segmentation plays an important role in practical applications such as self-driving and robots. Most semantic segmentation research focuses on improving estimation accuracy with little consideration on efficiency. Several previous studies that emphasize high-speed inference often fail to produce high-accuracy segmentation results. In this paper, we propose a novel convolutional… ▽ More Real-time semantic segmentation plays an important role in practical applications such as self-driving and robots. Most semantic segmentation research focuses on improving estimation accuracy with little consideration on efficiency. Several previous studies that emphasize high-speed inference often fail to produce high-accuracy segmentation results. In this paper, we propose a novel convolutional network named Efficient Dense modules with Asymmetric convolution (EDANet), which employs an asymmetric convolution structure and incorporates dilated convolution and dense connectivity to achieve high efficiency at low computational cost and model size. EDANet is 2.7 times faster than the existing fast segmentation network, ICNet, while it achieves a similar mIoU score without any additional context module, post-processing scheme, and pretrained model. We evaluate EDANet on Cityscapes and CamVid datasets, and compare it with the other state-of-art systems. Our network can run with the high-resolution inputs at the speed of 108 FPS on one GTX 1080Ti. △ Less

Submitted 28 December, 2019; v1 submitted 17 September, 2018; originally announced September 2018.

Comments: Accepted in ACM International Conference on Multimedia in Asia (MMAsia) 2019 [Best Paper Award]

arXiv:1809.03994 [pdf]

Efficient Road Lane Marking Detection with Deep Learning

Authors: **-Rong Chen, Shao-Yuan Lo, Hsueh-Ming Hang, Sheng-Wei Chan, **g-Jhih Lin

Abstract: Lane mark detection is an important element in the road scene analysis for Advanced Driver Assistant System (ADAS). Limited by the onboard computing power, it is still a challenge to reduce system complexity and maintain high accuracy at the same time. In this paper, we propose a Lane Marking Detector (LMD) using a deep convolutional neural network to extract robust lane marking features. To impro… ▽ More Lane mark detection is an important element in the road scene analysis for Advanced Driver Assistant System (ADAS). Limited by the onboard computing power, it is still a challenge to reduce system complexity and maintain high accuracy at the same time. In this paper, we propose a Lane Marking Detector (LMD) using a deep convolutional neural network to extract robust lane marking features. To improve its performance with a target of lower complexity, the dilated convolution is adopted. A shallower and thinner structure is designed to decrease the computational cost. Moreover, we also design post-processing algorithms to construct 3rd-order polynomial models to fit into the curved lanes. Our system shows promising results on the captured road scenes. △ Less

Submitted 11 September, 2018; originally announced September 2018.

Comments: Accepted at International Conference on Digital Signal Processing (DSP) 2018

arXiv:1809.00781 [pdf, ps, other]

doi 10.1109/TIT.2019.2951759

Matrix Infinitely Divisible Series: Tail Inequalities and Their Applications

Authors: Chao Zhang, Xianjie Gao, Min-Hsiu Hsieh, Hanyuan Hang, Dacheng Tao

Abstract: In this paper, we study tail inequalities of the largest eigenvalue of a matrix infinitely divisible (i.d.) series, which is a finite sum of fixed matrices weighted by i.d. random variables. We obtain several types of tail inequalities, including Bennett-type and Bernstein-type inequalities. This allows us to further bound the expectation of the spectral norm of a matrix i.d. series. Moreover, by… ▽ More In this paper, we study tail inequalities of the largest eigenvalue of a matrix infinitely divisible (i.d.) series, which is a finite sum of fixed matrices weighted by i.d. random variables. We obtain several types of tail inequalities, including Bennett-type and Bernstein-type inequalities. This allows us to further bound the expectation of the spectral norm of a matrix i.d. series. Moreover, by develo** a new lower-bound function for $Q(s)=(s+1)\log(s+1)-s$ that appears in the Bennett-type inequality, we derive a tighter tail inequality of the largest eigenvalue of the matrix i.d. series than the Bernstein-type inequality when the matrix dimension is high. The resulting lower-bound function is of independent interest and can improve any Bennett-type concentration inequality that involves the function $Q(s)$. The class of i.d. probability distributions is large and includes Gaussian and Poisson distributions, among many others. Therefore, our results encompass the existing work \cite{tropp2012user} on matrix Gaussian series as a special case. Lastly, we show that the tail inequalities of a matrix i.d. series have applications in several optimization problems including the chance constrained optimization problem and the quadratic optimization problem with orthogonality constraints. In addition, we also use the resulting tail bounds to show that random matrices constructed from i.d. random variables satisfy the restricted isometry property (RIP) when it acts as a measurement matrix in compressed sensing. △ Less

Submitted 30 May, 2022; v1 submitted 3 September, 2018; originally announced September 2018.

Comments: Final Version

Journal ref: IEEE Transactions on Information Theory, vol. 66, no. 2, pp.1099-1117 (2020)

arXiv:1605.02887 [pdf, ps, other]

Learning theory estimates with observations from general stationary stochastic processes

Authors: Hanyuan Hang, Yunlong Feng, Ingo Steinwart, Johan A. K. Suykens

Abstract: This paper investigates the supervised learning problem with observations drawn from certain general stationary stochastic processes. Here by \emph{general}, we mean that many stationary stochastic processes can be included. We show that when the stochastic processes satisfy a generalized Bernstein-type inequality, a unified treatment on analyzing the learning schemes with various mixing processes… ▽ More This paper investigates the supervised learning problem with observations drawn from certain general stationary stochastic processes. Here by \emph{general}, we mean that many stationary stochastic processes can be included. We show that when the stochastic processes satisfy a generalized Bernstein-type inequality, a unified treatment on analyzing the learning schemes with various mixing processes can be conducted and a sharp oracle inequality for generic regularized empirical risk minimization schemes can be established. The obtained oracle inequality is then applied to derive convergence rates for several learning schemes such as empirical risk minimization (ERM), least squares support vector machines (LS-SVMs) using given generic kernels, and SVMs using Gaussian kernels for both least squares and quantile regression. It turns out that for i.i.d.~processes, our learning rates for ERM recover the optimal rates. On the other hand, for non-i.i.d.~processes including geometrically $α$-mixing Markov processes, geometrically $α$-mixing processes with restricted decay, $φ$-mixing processes, and (time-reversed) geometrically $\mathcal{C}$-mixing processes, our learning rates for SVMs with Gaussian kernels match, up to some arbitrarily small extra term in the exponent, the optimal rates. For the remaining cases, our rates are at least close to the optimal rates. As a by-product, the assumed generalized Bernstein-type inequality also provides an interpretation of the so-called "effective number of observations" for various mixing processes. △ Less

Submitted 10 May, 2016; originally announced May 2016.

Comments: arXiv admin note: text overlap with arXiv:1501.03059

Showing 1–37 of 37 results for author: Hang, H