-
Themes Informed Audio-visual Correspondence Learning
Authors:
Runze Su,
Fei Tao,
Xudong Liu,
Haoran Wei,
Xiaorong Mei,
Zhiyao Duan,
Lei Yuan,
Ji Liu,
Yuying Xie
Abstract:
The applications of short-term user-generated video (UGV), such as Snapchat, and Youtube short-term videos, booms recently, raising lots of multimodal machine learning tasks. Among them, learning the correspondence between audio and visual information from videos is a challenging one. Most previous work of the audio-visual correspondence(AVC) learning only investigated constrained videos or simple…
▽ More
The applications of short-term user-generated video (UGV), such as Snapchat, and Youtube short-term videos, booms recently, raising lots of multimodal machine learning tasks. Among them, learning the correspondence between audio and visual information from videos is a challenging one. Most previous work of the audio-visual correspondence(AVC) learning only investigated constrained videos or simple settings, which may not fit the application of UGV. In this paper, we proposed new principles for AVC and introduced a new framework to set sight of videos' themes to facilitate AVC learning. We also released the KWAI-AD-AudVis corpus which contained 85432 short advertisement videos (around 913 hours) made by users. We evaluated our proposed approach on this corpus, and it was able to outperform the baseline by 23.15% absolute difference.
△ Less
Submitted 19 October, 2020; v1 submitted 14 September, 2020;
originally announced September 2020.
-
SMAPGAN: Generative Adversarial Network Based Semi-Supervised Styled Map Tiles Generating Method
Authors:
X. Chen,
S. Chen,
T. Xu,
B. Yin,
X. Mei,
J. Peng,
H. Li
Abstract:
Traditional online map tiles, widely used on the Internet such as Google Map and Baidu Map, are rendered from vector data. Timely updating online map tiles from vector data, of which the generating is time-consuming, is a difficult mission. It is a shortcut to generate map tiles in time from remote sensing images, which can be acquired timely without vector data. However, this mission used to be c…
▽ More
Traditional online map tiles, widely used on the Internet such as Google Map and Baidu Map, are rendered from vector data. Timely updating online map tiles from vector data, of which the generating is time-consuming, is a difficult mission. It is a shortcut to generate map tiles in time from remote sensing images, which can be acquired timely without vector data. However, this mission used to be challenging or even impossible. Inspired by image-to-image translation (img2img) techniques based on generative adversarial networks (GAN), we proposed a semi-supervised Generation of styled map Tiles based on Generative Adversarial Network (SMAPGAN) model to generate styled map tiles directly from remote sensing images. In this model, we designed a semi-supervised learning strategy to pre-train SMAPGAN on rich unpaired samples and fine-tune it on limited paired samples in reality. We also designed image gradient L1 loss and image gradient structure loss to generate a styled map tile with global topological relationships and detailed edge curves of objects, which are important in cartography. Moreover, we proposed edge structural similarity index (ESSI) as a metric to evaluate the quality of topological consistency between generated map tiles and ground truths. Experimental results present that SMAPGAN outperforms state-of-the-art (SOTA) works according to mean squared error, structural similarity index, and ESSI. Also, SMAPGAN won more approval than SOTA in the human perceptual test on the visual realism of cartography. Our work shows that SMAPGAN is potentially a new paradigm to produce styled map tiles. Our implementation of the SMAPGAN is available at https://github.com/imcsq/SMAPGAN.
△ Less
Submitted 1 April, 2021; v1 submitted 20 January, 2020;
originally announced January 2020.
-
LGM-Net: Learning to Generate Matching Networks for Few-Shot Learning
Authors:
Huaiyu Li,
Weiming Dong,
Xing Mei,
Chongyang Ma,
Feiyue Huang,
Bao-Gang Hu
Abstract:
In this work, we propose a novel meta-learning approach for few-shot classification, which learns transferable prior knowledge across tasks and directly produces network parameters for similar unseen tasks with training samples. Our approach, called LGM-Net, includes two key modules, namely, TargetNet and MetaNet. The TargetNet module is a neural network for solving a specific task and the MetaNet…
▽ More
In this work, we propose a novel meta-learning approach for few-shot classification, which learns transferable prior knowledge across tasks and directly produces network parameters for similar unseen tasks with training samples. Our approach, called LGM-Net, includes two key modules, namely, TargetNet and MetaNet. The TargetNet module is a neural network for solving a specific task and the MetaNet module aims at learning to generate functional weights for TargetNet by observing training samples. We also present an intertask normalization strategy for the training process to leverage common information shared across different tasks. The experimental results on Omniglot and miniImageNet datasets demonstrate that LGM-Net can effectively adapt to similar unseen tasks and achieve competitive performance, and the results on synthetic datasets show that transferable prior knowledge is learned by the MetaNet module via map** training data to functional weights. LGM-Net enables fast learning and adaptation since no further tuning steps are required compared to other meta-learning approaches.
△ Less
Submitted 14 May, 2019;
originally announced May 2019.
-
Unsupervised Ranking of Multi-Attribute Objects Based on Principal Curves
Authors:
Chun-Guo Li,
Xing Mei,
Bao-Gang Hu
Abstract:
Unsupervised ranking faces one critical challenge in evaluation applications, that is, no ground truth is available. When PageRank and its variants show a good solution in related subjects, they are applicable only for ranking from link-structure data. In this work, we focus on unsupervised ranking from multi-attribute data which is also common in evaluation tasks. To overcome the challenge, we pr…
▽ More
Unsupervised ranking faces one critical challenge in evaluation applications, that is, no ground truth is available. When PageRank and its variants show a good solution in related subjects, they are applicable only for ranking from link-structure data. In this work, we focus on unsupervised ranking from multi-attribute data which is also common in evaluation tasks. To overcome the challenge, we propose five essential meta-rules for the design and assessment of unsupervised ranking approaches: scale and translation invariance, strict monotonicity, linear/nonlinear capacities, smoothness, and explicitness of parameter size. These meta-rules are regarded as high level knowledge for unsupervised ranking tasks. Inspired by the works in [8] and [14], we propose a ranking principal curve (RPC) model, which learns a one-dimensional manifold function to perform unsupervised ranking tasks on multi-attribute observations. Furthermore, the RPC is modeled to be a cubic Bézier curve with control points restricted in the interior of a hypercube, thereby complying with all the five meta-rules to infer a reasonable ranking list. With control points as the model parameters, one is able to understand the learned manifold and to interpret the ranking list semantically. Numerical experiments of the presented RPC model are conducted on two open datasets of different ranking applications. In comparison with the state-of-the-art approaches, the new model is able to show more reasonable ranking lists.
△ Less
Submitted 18 February, 2014;
originally announced February 2014.