-
FAGhead: Fully Animate Gaussian Head from Monocular Videos
Authors:
Yixin Xuan,
Xinyang Li,
Gongxin Yao,
Shiwei Zhou,
Donghui Sun,
Xiaoxin Chen,
Yu Pan
Abstract:
High-fidelity reconstruction of 3D human avatars has a wild application in visual reality. In this paper, we introduce FAGhead, a method that enables fully controllable human portraits from monocular videos. We explicit the traditional 3D morphable meshes (3DMM) and optimize the neutral 3D Gaussians to reconstruct with complex expressions. Furthermore, we employ a novel Point-based Learnable Repre…
▽ More
High-fidelity reconstruction of 3D human avatars has a wild application in visual reality. In this paper, we introduce FAGhead, a method that enables fully controllable human portraits from monocular videos. We explicit the traditional 3D morphable meshes (3DMM) and optimize the neutral 3D Gaussians to reconstruct with complex expressions. Furthermore, we employ a novel Point-based Learnable Representation Field (PLRF) with learnable Gaussian point positions to enhance reconstruction performance. Meanwhile, to effectively manage the edges of avatars, we introduced the alpha rendering to supervise the alpha value of each pixel. Extensive experimental results on the open-source datasets and our capturing datasets demonstrate that our approach is able to generate high-fidelity 3D head avatars and fully control the expression and pose of the virtual avatars, which is outperforming than existing works.
△ Less
Submitted 28 June, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Federated Learning with Limited Node Labels
Authors:
Bisheng Tang,
Xiaojun Chen,
Shaopu Wang,
Yuexin Xuan,
Zhendong Zhao
Abstract:
Subgraph federated learning (SFL) is a research methodology that has gained significant attention for its potential to handle distributed graph-structured data. In SFL, the local model comprises graph neural networks (GNNs) with a partial graph structure. However, some SFL models have overlooked the significance of missing cross-subgraph edges, which can lead to local GNNs being unable to message-…
▽ More
Subgraph federated learning (SFL) is a research methodology that has gained significant attention for its potential to handle distributed graph-structured data. In SFL, the local model comprises graph neural networks (GNNs) with a partial graph structure. However, some SFL models have overlooked the significance of missing cross-subgraph edges, which can lead to local GNNs being unable to message-pass global representations to other parties' GNNs. Moreover, existing SFL models require substantial labeled data, which limits their practical applications. To overcome these limitations, we present a novel SFL framework called FedMpa that aims to learn cross-subgraph node representations. FedMpa first trains a multilayer perceptron (MLP) model using a small amount of data and then propagates the federated feature to the local structures. To further improve the embedding representation of nodes with local subgraphs, we introduce the FedMpae method, which reconstructs the local graph structure with an innovation view that applies pooling operation to form super-nodes. Our extensive experiments on six graph datasets demonstrate that FedMpa is highly effective in node classification. Furthermore, our ablation experiments verify the effectiveness of FedMpa.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
GGAvatar: Geometric Adjustment of Gaussian Head Avatar
Authors:
Xinyang Li,
Jiaxin Wang,
Yixin Xuan,
Gongxin Yao,
Yu Pan
Abstract:
We propose GGAvatar, a novel 3D avatar representation designed to robustly model dynamic head avatars with complex identities and deformations. GGAvatar employs a coarse-to-fine structure, featuring two core modules: Neutral Gaussian Initialization Module and Geometry Morph Adjuster. Neutral Gaussian Initialization Module pairs Gaussian primitives with deformable triangular meshes, employing an ad…
▽ More
We propose GGAvatar, a novel 3D avatar representation designed to robustly model dynamic head avatars with complex identities and deformations. GGAvatar employs a coarse-to-fine structure, featuring two core modules: Neutral Gaussian Initialization Module and Geometry Morph Adjuster. Neutral Gaussian Initialization Module pairs Gaussian primitives with deformable triangular meshes, employing an adaptive density control strategy to model the geometric structure of the target subject with neutral expressions. Geometry Morph Adjuster introduces deformation bases for each Gaussian in global space, creating fine-grained low-dimensional representations of deformation behaviors to address the Linear Blend Skinning formula's limitations effectively. Extensive experiments show that GGAvatar can produce high-fidelity renderings, outperforming state-of-the-art methods in visual quality and quantitative metrics.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Diffusion Cross-domain Recommendation
Authors:
Yuner Xuan
Abstract:
It is always a challenge for recommender systems to give high-quality outcomes to cold-start users. One potential solution to alleviate the data sparsity problem for cold-start users in the target domain is to add data from the auxiliary domain. Finding a proper way to extract knowledge from an auxiliary domain and transfer it into a target domain is one of the main objectives for cross-domain rec…
▽ More
It is always a challenge for recommender systems to give high-quality outcomes to cold-start users. One potential solution to alleviate the data sparsity problem for cold-start users in the target domain is to add data from the auxiliary domain. Finding a proper way to extract knowledge from an auxiliary domain and transfer it into a target domain is one of the main objectives for cross-domain recommendation (CDR) research. Among the existing methods, map** approach is a popular one to implement cross-domain recommendation models (CDRs). For models of this type, a map** module plays the role of transforming data from one domain to another. It primarily determines the performance of map** approach CDRs. Recently, diffusion probability models (DPMs) have achieved impressive success for image synthesis related tasks. They involve recovering images from noise-added samples, which can be viewed as a data transformation process with outstanding performance. To further enhance the performance of CDRs, we first reveal the potential connection between DPMs and map** modules of CDRs, and then propose a novel CDR model named Diffusion Cross-domain Recommendation (DiffCDR). More specifically, we first adopt the theory of DPM and design a Diffusion Module (DIM), which generates user's embedding in target domain. To reduce the negative impact of randomness introduced in DIM and improve the stability, we employ an Alignment Module to produce the aligned user embeddings. In addition, we consider the label data of the target domain and form the task-oriented loss function, which enables our DiffCDR to adapt to specific tasks. By conducting extensive experiments on datasets collected from reality, we demonstrate the effectiveness and adaptability of DiffCDR to outperform baseline models on various CDR tasks in both cold-start and warm-start scenarios.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Can Users Correctly Interpret Machine Learning Explanations and Simultaneously Identify Their Limitations?
Authors:
Yueqing Xuan,
Edward Small,
Kacper Sokol,
Danula Hettiachchi,
Mark Sanderson
Abstract:
Automated decision-making systems are becoming increasingly ubiquitous, motivating an immediate need for their explainability. However, it remains unclear whether users know what insights an explanation offers and, more importantly, what information it lacks. We conducted an online study with 200 participants to assess explainees' ability to realise known and unknown information for four represent…
▽ More
Automated decision-making systems are becoming increasingly ubiquitous, motivating an immediate need for their explainability. However, it remains unclear whether users know what insights an explanation offers and, more importantly, what information it lacks. We conducted an online study with 200 participants to assess explainees' ability to realise known and unknown information for four representative explanations: transparent modelling, decision boundary visualisation, counterfactual explainability and feature importance. Our findings demonstrate that feature importance and decision boundary visualisation are the most comprehensible, but their limitations are not necessarily recognised by the users. In addition, correct interpretation of an explanation -- i.e., understanding known information -- is accompanied by high confidence, but a failure to gauge its limits -- thus grasp unknown information -- yields overconfidence; the latter phenomenon is especially prominent for feature importance and transparent modelling. Machine learning explanations should therefore embrace their richness and limitations to maximise understanding and curb misinterpretation.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
SpecTracle: Wearable Facial Motion Tracking from Unobtrusive Peripheral Cameras
Authors:
Yinan Xuan,
Varun Viswanath,
Sunny Chu,
Owen Bartolf,
Jessica Echterhoff,
Edward Wang
Abstract:
Facial motion tracking in head-mounted displays (HMD) has the potential to enable immersive "face-to-face" interaction in a virtual environment. However, current works on facial tracking are not suitable for unobtrusive augmented reality (AR) glasses or do not have the ability to track arbitrary facial movements. In this work, we demonstrate a novel system called SpecTracle that tracks a user's fa…
▽ More
Facial motion tracking in head-mounted displays (HMD) has the potential to enable immersive "face-to-face" interaction in a virtual environment. However, current works on facial tracking are not suitable for unobtrusive augmented reality (AR) glasses or do not have the ability to track arbitrary facial movements. In this work, we demonstrate a novel system called SpecTracle that tracks a user's facial motions using two wide-angle cameras mounted right next to the visor of a Hololens. Avoiding the usage of cameras extended in front of the face, our system greatly improves the feasibility to integrate full-face tracking into a low-profile form factor. We also demonstrate that a neural network-based model processing the wide-angle cameras can run in real-time at 24 frames per second (fps) on a mobile GPU and track independent facial movement for different parts of the face with a user-independent model. Using a short personalized calibration, the system improves its tracking performance by 42.3% compared to the user-independent model.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Quantity-Aware Coarse-to-Fine Correspondence for Image-to-Point Cloud Registration
Authors:
Gongxin Yao,
Yixin Xuan,
Yiwei Chen,
Yu Pan
Abstract:
Image-to-point cloud registration aims to determine the relative camera pose between an RGB image and a reference point cloud, serving as a general solution for locating 3D objects from 2D observations. Matching individual points with pixels can be inherently ambiguous due to modality gaps. To address this challenge, we propose a framework to capture quantity-aware correspondences between local po…
▽ More
Image-to-point cloud registration aims to determine the relative camera pose between an RGB image and a reference point cloud, serving as a general solution for locating 3D objects from 2D observations. Matching individual points with pixels can be inherently ambiguous due to modality gaps. To address this challenge, we propose a framework to capture quantity-aware correspondences between local point sets and pixel patches and refine the results at both the point and pixel levels. This framework aligns the high-level semantics of point sets and pixel patches to improve the matching accuracy. On a coarse scale, the set-to-patch correspondence is expected to be influenced by the quantity of 3D points. To achieve this, a novel supervision strategy is proposed to adaptively quantify the degrees of correlation as continuous values. On a finer scale, point-to-pixel correspondences are refined from a smaller search space through a well-designed scheme, which incorporates both resampling and quantity-aware priors. Particularly, a confidence sorting strategy is proposed to proportionally select better correspondences at the final stage. Leveraging the advantages of high-quality correspondences, the problem is successfully resolved using an efficient Perspective-n-Point solver within the framework of random sample consensus (RANSAC). Extensive experiments on the KITTI Odometry and NuScenes datasets demonstrate the superiority of our method over the state-of-the-art methods.
△ Less
Submitted 18 January, 2024; v1 submitted 13 July, 2023;
originally announced July 2023.
-
Sphere2Vec: A General-Purpose Location Representation Learning over a Spherical Surface for Large-Scale Geospatial Predictions
Authors:
Gengchen Mai,
Yao Xuan,
Wenyun Zuo,
Yutong He,
Jiaming Song,
Stefano Ermon,
Krzysztof Janowicz,
Ni Lao
Abstract:
Generating learning-friendly representations for points in space is a fundamental and long-standing problem in ML. Recently, multi-scale encoding schemes (such as Space2Vec and NeRF) were proposed to directly encode any point in 2D/3D Euclidean space as a high-dimensional vector, and has been successfully applied to various geospatial prediction and generative tasks. However, all current 2D and 3D…
▽ More
Generating learning-friendly representations for points in space is a fundamental and long-standing problem in ML. Recently, multi-scale encoding schemes (such as Space2Vec and NeRF) were proposed to directly encode any point in 2D/3D Euclidean space as a high-dimensional vector, and has been successfully applied to various geospatial prediction and generative tasks. However, all current 2D and 3D location encoders are designed to model point distances in Euclidean space. So when applied to large-scale real-world GPS coordinate datasets, which require distance metric learning on the spherical surface, both types of models can fail due to the map projection distortion problem (2D) and the spherical-to-Euclidean distance approximation error (3D). To solve these problems, we propose a multi-scale location encoder called Sphere2Vec which can preserve spherical distances when encoding point coordinates on a spherical surface. We developed a unified view of distance-reserving encoding on spheres based on the DFS. We also provide theoretical proof that the Sphere2Vec preserves the spherical surface distance between any two points, while existing encoding schemes do not. Experiments on 20 synthetic datasets show that Sphere2Vec can outperform all baseline models on all these datasets with up to 30.8% error rate reduction. We then apply Sphere2Vec to three geo-aware image classification tasks - fine-grained species recognition, Flickr image recognition, and remote sensing image classification. Results on 7 real-world datasets show the superiority of Sphere2Vec over multiple location encoders on all three tasks. Further analysis shows that Sphere2Vec outperforms other location encoder models, especially in the polar regions and data-sparse areas because of its nature for spherical surface distance preservation. Code and data are available at https://gengchenmai.github.io/sphere2vec-website/.
△ Less
Submitted 2 July, 2023; v1 submitted 30 June, 2023;
originally announced June 2023.
-
Practical and General Backdoor Attacks against Vertical Federated Learning
Authors:
Yuexin Xuan,
Xiaojun Chen,
Zhendong Zhao,
Bisheng Tang,
Ye Dong
Abstract:
Federated learning (FL), which aims to facilitate data collaboration across multiple organizations without exposing data privacy, encounters potential security risks. One serious threat is backdoor attacks, where an attacker injects a specific trigger into the training dataset to manipulate the model's prediction. Most existing FL backdoor attacks are based on horizontal federated learning (HFL),…
▽ More
Federated learning (FL), which aims to facilitate data collaboration across multiple organizations without exposing data privacy, encounters potential security risks. One serious threat is backdoor attacks, where an attacker injects a specific trigger into the training dataset to manipulate the model's prediction. Most existing FL backdoor attacks are based on horizontal federated learning (HFL), where the data owned by different parties have the same features. However, compared to HFL, backdoor attacks on vertical federated learning (VFL), where each party only holds a disjoint subset of features and the labels are only owned by one party, are rarely studied. The main challenge of this attack is to allow an attacker without access to the data labels, to perform an effective attack. To this end, we propose BadVFL, a novel and practical approach to inject backdoor triggers into victim models without label information. BadVFL mainly consists of two key steps. First, to address the challenge of attackers having no knowledge of labels, we introduce a SDD module that can trace data categories based on gradients. Second, we propose a SDP module that can improve the attack's effectiveness by enhancing the decision dependency between the trigger and attack target. Extensive experiments show that BadVFL supports diverse datasets and models, and achieves over 93% attack success rate with only 1% poisoning rate.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Navigating Explanatory Multiverse Through Counterfactual Path Geometry
Authors:
Kacper Sokol,
Edward Small,
Yueqing Xuan
Abstract:
Counterfactual explanations are the de facto standard when tasked with interpreting decisions of (opaque) predictive models. Their generation is often subject to algorithmic and domain-specific constraints -- such as density-based feasibility, and attribute (im)mutability or directionality of change -- that aim to maximise their real-life utility. In addition to desiderata with respect to the coun…
▽ More
Counterfactual explanations are the de facto standard when tasked with interpreting decisions of (opaque) predictive models. Their generation is often subject to algorithmic and domain-specific constraints -- such as density-based feasibility, and attribute (im)mutability or directionality of change -- that aim to maximise their real-life utility. In addition to desiderata with respect to the counterfactual instance itself, existence of a viable path connecting it with the factual data point, known as algorithmic recourse, has become an important technical consideration. While both of these requirements ensure that the steps of the journey as well as its destination are admissible, current literature neglects the multiplicity of such counterfactual paths. To address this shortcoming we introduce the novel concept of explanatory multiverse that encompasses all the possible counterfactual journeys. We then show how to navigate, reason about and compare the geometry of these trajectories with two methods: vector spaces and graphs. To this end, we overview their spacial properties -- such as affinity, branching, divergence and possible future convergence -- and propose an all-in-one metric, called opportunity potential, to quantify them. Implementing this (possibly interactive) explanatory process grants explainees agency by allowing them to select counterfactuals based on the properties of the journey leading to them in addition to their absolute differences. We show the flexibility, benefit and efficacy of such an approach through examples and quantitative evaluation on the German Credit and MNIST data sets.
△ Less
Submitted 3 May, 2024; v1 submitted 5 June, 2023;
originally announced June 2023.
-
More Is Less: When Do Recommenders Underperform for Data-rich Users?
Authors:
Yueqing Xuan,
Kacper Sokol,
Jeffrey Chan,
Mark Sanderson
Abstract:
Users of recommender systems tend to differ in their level of interaction with these algorithms, which may affect the quality of recommendations they receive and lead to undesirable performance disparity. In this paper we investigate under what conditions the performance for data-rich and data-poor users diverges for a collection of popular evaluation metrics applied to ten benchmark datasets. We…
▽ More
Users of recommender systems tend to differ in their level of interaction with these algorithms, which may affect the quality of recommendations they receive and lead to undesirable performance disparity. In this paper we investigate under what conditions the performance for data-rich and data-poor users diverges for a collection of popular evaluation metrics applied to ten benchmark datasets. We find that Precision is consistently higher for data-rich users across all the datasets; Mean Average Precision is comparable across user groups but its variance is large; Recall yields a counter-intuitive result where the algorithm performs better for data-poor than for data-rich users, which bias is further exacerbated when negative item sampling is employed during evaluation. The final observation suggests that as users interact more with recommender systems, the quality of recommendations they receive degrades (when measured by Recall). Our insights clearly show the importance of an evaluation protocol and its influence on the reported results when studying recommender systems.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
Helpful, Misleading or Confusing: How Humans Perceive Fundamental Building Blocks of Artificial Intelligence Explanations
Authors:
Edward Small,
Yueqing Xuan,
Danula Hettiachchi,
Kacper Sokol
Abstract:
Explainable artificial intelligence techniques are developed at breakneck speed, but suitable evaluation approaches lag behind. With explainers becoming increasingly complex and a lack of consensus on how to assess their utility, it is challenging to judge the benefit and effectiveness of different explanations. To address this gap, we take a step back from sophisticated predictive algorithms and…
▽ More
Explainable artificial intelligence techniques are developed at breakneck speed, but suitable evaluation approaches lag behind. With explainers becoming increasingly complex and a lack of consensus on how to assess their utility, it is challenging to judge the benefit and effectiveness of different explanations. To address this gap, we take a step back from sophisticated predictive algorithms and instead look into explainability of simple decision-making models. In this setting, we aim to assess how people perceive comprehensibility of their different representations such as mathematical formulation, graphical representation and textual summarisation (of varying complexity and scope). This allows us to capture how diverse stakeholders -- engineers, researchers, consumers, regulators and the like -- judge intelligibility of fundamental concepts that more elaborate artificial intelligence explanations are built from. This position paper charts our approach to establishing appropriate evaluation methodology as well as a conceptual and practical framework to facilitate setting up and executing relevant user studies.
△ Less
Submitted 15 April, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
See or Hear? Exploring the Effect of Visual and Audio Hints and Gaze-assisted Task Feedback for Visual Search Tasks in Augmented Reality
Authors:
Yuchong Zhang,
Adam Nowak,
Yueming Xuan,
Andrzej Romanowski,
Morten Fjeld
Abstract:
Augmented reality (AR) is emerging in visual search tasks for increasingly immersive interactions with virtual objects. We propose an AR approach providing visual and audio hints along with gaze-assisted instant post-task feedback for search tasks based on mobile head-mounted display (HMD). The target case was a book-searching task, in which we aimed to explore the effect of the hints together wit…
▽ More
Augmented reality (AR) is emerging in visual search tasks for increasingly immersive interactions with virtual objects. We propose an AR approach providing visual and audio hints along with gaze-assisted instant post-task feedback for search tasks based on mobile head-mounted display (HMD). The target case was a book-searching task, in which we aimed to explore the effect of the hints together with the task feedback with two hypotheses. H1: Since visual and audio hints can positively affect AR search tasks, the combination outperforms the individuals. H2: The gaze-assisted instant post-task feedback can positively affect AR search tasks. The proof-of-concept was demonstrated by an AR app in HMD and a comprehensive user study (n=96) consisting of two sub-studies, Study I (n=48) without task feedback and Study II (n=48) with task feedback. Following quantitative and qualitative analysis, our results partially verified H1 and completely verified H2, enabling us to conclude that the synthesis of visual and audio hints conditionally improves the AR visual search task efficiency when coupled with task feedback.
△ Less
Submitted 14 November, 2023; v1 submitted 3 February, 2023;
originally announced February 2023.
-
Playing with Data: An Augmented Reality Approach to Interact with Visualizations of Industrial Process Tomography
Authors:
Yuchong Zhang,
Yueming Xuan,
Rahul Yadav,
Adel Omrani,
Morten Fjeld
Abstract:
Industrial process tomography (IPT) is a specialized imaging technique widely used in industrial scenarios for process supervision and control. Today, augmented/mixed reality (AR/MR) is increasingly being adopted in many industrial occasions, even though there is still an obvious gap when it comes to IPT. To bridge this gap, we propose the first systematic AR approach using optical see-through (OS…
▽ More
Industrial process tomography (IPT) is a specialized imaging technique widely used in industrial scenarios for process supervision and control. Today, augmented/mixed reality (AR/MR) is increasingly being adopted in many industrial occasions, even though there is still an obvious gap when it comes to IPT. To bridge this gap, we propose the first systematic AR approach using optical see-through (OST) head mounted displays (HMDs) with comparative evaluation for domain users towards IPT visualization analysis. The proof-of-concept was demonstrated by a within-subject user study (n=20) with counterbalancing design. Both qualitative and quantitative measurements were investigated. The results showed that our AR approach outperformed conventional settings for IPT data visualization analysis in bringing higher understandability, reduced task completion time, lower error rates for domain tasks, increased usability with enhanced user experience, and a better recommendation level. We summarize the findings and suggest future research directions for benefiting IPT users with AR/MR.
△ Less
Submitted 10 March, 2024; v1 submitted 3 February, 2023;
originally announced February 2023.
-
Machine Learning and Polymer Self-Consistent Field Theory in Two Spatial Dimensions
Authors:
Yao Xuan,
Kris T. Delaney,
Hector D. Ceniceros,
Glenn H. Fredrickson
Abstract:
A computational framework that leverages data from self-consistent field theory simulations with deep learning to accelerate the exploration of parameter space for block copolymers is presented. This is a substantial two-dimensional extension of the framework introduced in [1]. Several innovations and improvements are proposed. (1) A Sobolev space-trained, convolutional neural network (CNN) is emp…
▽ More
A computational framework that leverages data from self-consistent field theory simulations with deep learning to accelerate the exploration of parameter space for block copolymers is presented. This is a substantial two-dimensional extension of the framework introduced in [1]. Several innovations and improvements are proposed. (1) A Sobolev space-trained, convolutional neural network (CNN) is employed to handle the exponential dimension increase of the discretized, local average monomer density fields and to strongly enforce both spatial translation and rotation invariance of the predicted, field-theoretic intensive Hamiltonian. (2) A generative adversarial network (GAN) is introduced to efficiently and accurately predict saddle point, local average monomer density fields without resorting to gradient descent methods that employ the training set. This GAN approach yields important savings of both memory and computational cost. (3) The proposed machine learning framework is successfully applied to 2D cell size optimization as a clear illustration of its broad potential to accelerate the exploration of parameter space for discovering polymer nanostructures. Extensions to three-dimensional phase discovery appear to be feasible.
△ Less
Submitted 3 July, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.
-
Towards General-Purpose Representation Learning of Polygonal Geometries
Authors:
Gengchen Mai,
Chiyu Jiang,
Weiwei Sun,
Rui Zhu,
Yao Xuan,
Ling Cai,
Krzysztof Janowicz,
Stefano Ermon,
Ni Lao
Abstract:
Neural network representation learning for spatial data is a common need for geographic artificial intelligence (GeoAI) problems. In recent years, many advancements have been made in representation learning for points, polylines, and networks, whereas little progress has been made for polygons, especially complex polygonal geometries. In this work, we focus on develo** a general-purpose polygon…
▽ More
Neural network representation learning for spatial data is a common need for geographic artificial intelligence (GeoAI) problems. In recent years, many advancements have been made in representation learning for points, polylines, and networks, whereas little progress has been made for polygons, especially complex polygonal geometries. In this work, we focus on develo** a general-purpose polygon encoding model, which can encode a polygonal geometry (with or without holes, single or multipolygons) into an embedding space. The result embeddings can be leveraged directly (or finetuned) for downstream tasks such as shape classification, spatial relation prediction, and so on. To achieve model generalizability guarantees, we identify a few desirable properties: loop origin invariance, trivial vertex invariance, part permutation invariance, and topology awareness. We explore two different designs for the encoder: one derives all representations in the spatial domain; the other leverages spectral domain representations. For the spatial domain approach, we propose ResNet1D, a 1D CNN-based polygon encoder, which uses circular padding to achieve loop origin invariance on simple polygons. For the spectral domain approach, we develop NUFTspec based on Non-Uniform Fourier Transformation (NUFT), which naturally satisfies all the desired properties. We conduct experiments on two tasks: 1) shape classification based on MNIST; 2) spatial relation prediction based on two new datasets - DBSR-46K and DBSR-cplx46K. Our results show that NUFTspec and ResNet1D outperform multiple existing baselines with significant margins. While ResNet1D suffers from model performance degradation after shape-invariance geometry modifications, NUFTspec is very robust to these modifications due to the nature of the NUFT.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Pandemic Control, Game Theory and Machine Learning
Authors:
Yao Xuan,
Robert Balkin,
Jiequn Han,
Ruimeng Hu,
Hector D. Ceniceros
Abstract:
Game theory has been an effective tool in the control of disease spread and in suggesting optimal policies at both individual and area levels. In this AMS Notices article, we focus on the decision-making development for the intervention of COVID-19, aiming to provide mathematical models and efficient machine learning methods, and justifications for related policies that have been implemented in th…
▽ More
Game theory has been an effective tool in the control of disease spread and in suggesting optimal policies at both individual and area levels. In this AMS Notices article, we focus on the decision-making development for the intervention of COVID-19, aiming to provide mathematical models and efficient machine learning methods, and justifications for related policies that have been implemented in the past and explain how the authorities' decisions affect their neighboring regions from a game theory viewpoint.
△ Less
Submitted 18 August, 2022;
originally announced August 2022.
-
Label Matching Semi-Supervised Object Detection
Authors:
Binbin Chen,
Weijie Chen,
Shicai Yang,
Yunyi Xuan,
Jie Song,
Di Xie,
Shiliang Pu,
Mingli Song,
Yueting Zhuang
Abstract:
Semi-supervised object detection has made significant progress with the development of mean teacher driven self-training. Despite the promising results, the label mismatch problem is not yet fully explored in the previous works, leading to severe confirmation bias during self-training. In this paper, we delve into this problem and propose a simple yet effective LabelMatch framework from two differ…
▽ More
Semi-supervised object detection has made significant progress with the development of mean teacher driven self-training. Despite the promising results, the label mismatch problem is not yet fully explored in the previous works, leading to severe confirmation bias during self-training. In this paper, we delve into this problem and propose a simple yet effective LabelMatch framework from two different yet complementary perspectives, i.e., distribution-level and instance-level. For the former one, it is reasonable to approximate the class distribution of the unlabeled data from that of the labeled data according to Monte Carlo Sampling. Guided by this weakly supervision cue, we introduce a re-distribution mean teacher, which leverages adaptive label-distribution-aware confidence thresholds to generate unbiased pseudo labels to drive student learning. For the latter one, there exists an overlooked label assignment ambiguity problem across teacher-student models. To remedy this issue, we present a novel label assignment mechanism for self-training framework, namely proposal self-assignment, which injects the proposals from student into teacher and generates accurate pseudo labels to match each proposal in the student model accordingly. Experiments on both MS-COCO and PASCAL-VOC datasets demonstrate the considerable superiority of our proposed framework to other state-of-the-arts. Code will be available at https://github.com/hikvision-research/SSOD.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Sphere2Vec: Multi-Scale Representation Learning over a Spherical Surface for Geospatial Predictions
Authors:
Gengchen Mai,
Yao Xuan,
Wenyun Zuo,
Krzysztof Janowicz,
Ni Lao
Abstract:
Generating learning-friendly representations for points in a 2D space is a fundamental and long-standing problem in machine learning. Recently, multi-scale encoding schemes (such as Space2Vec) were proposed to directly encode any point in 2D space as a high-dimensional vector, and has been successfully applied to various (geo)spatial prediction tasks. However, a map projection distortion problem r…
▽ More
Generating learning-friendly representations for points in a 2D space is a fundamental and long-standing problem in machine learning. Recently, multi-scale encoding schemes (such as Space2Vec) were proposed to directly encode any point in 2D space as a high-dimensional vector, and has been successfully applied to various (geo)spatial prediction tasks. However, a map projection distortion problem rises when applying location encoding models to large-scale real-world GPS coordinate datasets (e.g., species images taken all over the world) - all current location encoding models are designed for encoding points in a 2D (Euclidean) space but not on a spherical surface, e.g., earth surface. To solve this problem, we propose a multi-scale location encoding model called Sphere2V ec which directly encodes point coordinates on a spherical surface while avoiding the mapprojection distortion problem. We provide theoretical proof that the Sphere2Vec encoding preserves the spherical surface distance between any two points. We also developed a unified view of distance-reserving encoding on spheres based on the Double Fourier Sphere (DFS). We apply Sphere2V ec to the geo-aware image classification task. Our analysis shows that Sphere2V ec outperforms other 2D space location encoder models especially on the polar regions and data-sparse areas for image classification tasks because of its nature for spherical surface distance preservation.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
Optimal Policies for a Pandemic: A Stochastic Game Approach and a Deep Learning Algorithm
Authors:
Yao Xuan,
Robert Balkin,
Jiequn Han,
Ruimeng Hu,
Hector D. Ceniceros
Abstract:
Game theory has been an effective tool in the control of disease spread and in suggesting optimal policies at both individual and area levels. In this paper, we propose a multi-region SEIR model based on stochastic differential game theory, aiming to formulate optimal regional policies for infectious diseases. Specifically, we enhance the standard epidemic SEIR model by taking into account the soc…
▽ More
Game theory has been an effective tool in the control of disease spread and in suggesting optimal policies at both individual and area levels. In this paper, we propose a multi-region SEIR model based on stochastic differential game theory, aiming to formulate optimal regional policies for infectious diseases. Specifically, we enhance the standard epidemic SEIR model by taking into account the social and health policies issued by multiple region planners. This enhancement makes the model more realistic and powerful. However, it also introduces a formidable computational challenge due to the high dimensionality of the solution space brought by the presence of multiple regions. This significant numerical difficulty of the model structure motivates us to generalize the deep fictitious algorithm introduced in [Han and Hu, MSML2020, pp.221--245, PMLR, 2020] and develop an improved algorithm to overcome the curse of dimensionality. We apply the proposed model and algorithm to study the COVID-19 pandemic in three states: New York, New Jersey, and Pennsylvania. The model parameters are estimated from real data posted by the Centers for Disease Control and Prevention (CDC). We are able to show the effects of the lockdown/travel ban policy on the spread of COVID-19 for each state and how their policies affect each other.
△ Less
Submitted 8 March, 2021; v1 submitted 12 December, 2020;
originally announced December 2020.
-
Heterogeneous Collaborative Filtering
Authors:
Yifang Liu,
Zhentao Xu,
Cong Hui,
Yi Xuan,
Jessie Chen,
Yuanming Shan
Abstract:
Recommendation system is important to a content sharing/creating social network. Collaborative filtering is a widely-adopted technology in conventional recommenders, which is based on similarity between positively engaged content items involving the same users. Conventional collaborative filtering (CCF) suffers from cold start problem and narrow content diversity. We propose a new recommendation a…
▽ More
Recommendation system is important to a content sharing/creating social network. Collaborative filtering is a widely-adopted technology in conventional recommenders, which is based on similarity between positively engaged content items involving the same users. Conventional collaborative filtering (CCF) suffers from cold start problem and narrow content diversity. We propose a new recommendation approach, heterogeneous collaborative filtering (HCF) to tackle these challenges at the root, while kee** the strength of collaborative filtering. We present two implementation algorithms of HCF for content recommendation and content dissemination. Experiment results demonstrate that our approach improve the recommendation quality in a real world social network for content creating and sharing.
△ Less
Submitted 31 August, 2019;
originally announced September 2019.
-
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting
Authors:
Shiyang Li,
Xiaoyong **,
Yao Xuan,
Xiyou Zhou,
Wenhu Chen,
Yu-Xiang Wang,
Xifeng Yan
Abstract:
Time series forecasting is an important problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. In this paper, we propose to tackle such forecasting problem with Transformer [1]. Although impressed by its performance in our preliminary study, we found its two major weaknesses: (1) locality-agnostics: the point-wise dot-pr…
▽ More
Time series forecasting is an important problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. In this paper, we propose to tackle such forecasting problem with Transformer [1]. Although impressed by its performance in our preliminary study, we found its two major weaknesses: (1) locality-agnostics: the point-wise dot-product self-attention in canonical Transformer architecture is insensitive to local context, which can make the model prone to anomalies in time series; (2) memory bottleneck: space complexity of canonical Transformer grows quadratically with sequence length $L$, making directly modeling long time series infeasible. In order to solve these two issues, we first propose convolutional self-attention by producing queries and keys with causal convolution so that local context can be better incorporated into attention mechanism. Then, we propose LogSparse Transformer with only $O(L(\log L)^{2})$ memory cost, improving forecasting accuracy for time series with fine granularity and strong long-term dependencies under constrained memory budget. Our experiments on both synthetic data and real-world datasets show that it compares favorably to the state-of-the-art.
△ Less
Submitted 3 January, 2020; v1 submitted 29 June, 2019;
originally announced July 2019.
-
A Multi-Agent Deep Reinforcement Learning based Spectrum Allocation Framework for D2D Communications
Authors:
Zheng Li,
Caili Guo,
Yidi Xuan
Abstract:
Device-to-device (D2D) communication has been recognized as a promising technique to improve spectrum efficiency. However, D2D transmission as an underlay causes severe interference, which imposes a technical challenge to spectrum allocation. Existing centralized schemes require global information, which can cause serious signaling overhead. While existing distributed solution requires frequent in…
▽ More
Device-to-device (D2D) communication has been recognized as a promising technique to improve spectrum efficiency. However, D2D transmission as an underlay causes severe interference, which imposes a technical challenge to spectrum allocation. Existing centralized schemes require global information, which can cause serious signaling overhead. While existing distributed solution requires frequent information exchange between users and cannot achieve global optimization. In this paper, a distributed spectrum allocation framework based on multi-agent deep reinforcement learning is proposed, named Neighbor-Agent Actor Critic (NAAC). NAAC uses neighbor users' historical information for centralized training but is executed distributedly without that information, which not only has no signal interaction during execution, but also utilizes cooperation between users to further optimize system performance. The simulation results show that the proposed framework can effectively reduce the outage probability of cellular links, improve the sum rate of D2D links and have good convergence.
△ Less
Submitted 17 December, 2019; v1 submitted 13 April, 2019;
originally announced April 2019.