Search | arXiv e-print repository

Exploring the Individuality and Collectivity of Intents behind Interactions for Graph Collaborative Filtering

Authors: Yi Zhang, Lei Sang, Yiwen Zhang

Abstract: Intent modeling has attracted widespread attention in recommender systems. As the core motivation behind user selection of items, intent is crucial for elucidating recommendation results. The current mainstream modeling method is to abstract the intent into unknowable but learnable shared or non-shared parameters. Despite considerable progress, we argue that it still confronts the following challe… ▽ More Intent modeling has attracted widespread attention in recommender systems. As the core motivation behind user selection of items, intent is crucial for elucidating recommendation results. The current mainstream modeling method is to abstract the intent into unknowable but learnable shared or non-shared parameters. Despite considerable progress, we argue that it still confronts the following challenges: firstly, these methods only capture the coarse-grained aspects of intent, ignoring the fact that user-item interactions will be affected by collective and individual factors (e.g., a user may choose a movie because of its high box office or because of his own unique preferences); secondly, modeling believable intent is severely hampered by implicit feedback, which is incredibly sparse and devoid of true semantics. To address these challenges, we propose a novel recommendation framework designated as Bilateral Intent-guided Graph Collaborative Filtering (BIGCF). Specifically, we take a closer look at user-item interactions from a causal perspective and put forth the concepts of individual intent-which signifies private preferences-and collective intent-which denotes overall awareness. To counter the sparsity of implicit feedback, the feature distributions of users and items are encoded via a Gaussian-based graph generation strategy, and we implement the recommendation process through bilateral intent-guided graph reconstruction re-sampling. Finally, we propose graph contrastive regularization for both interaction and intent spaces to uniformize users, items, intents, and interactions in a self-supervised and non-augmented paradigm. Experimental results on three real-world datasets demonstrate the effectiveness of BIGCF compared with existing solutions. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 10 pages, 7 figures, accepted by SIGIR 2024

arXiv:2405.04942 [pdf, other]

Dual-domain Collaborative Denoising for Social Recommendation

Authors: Wenjie Chen, Yi Zhang, Honghao Li, Lei Sang, Yiwen Zhang

Abstract: Social recommendation leverages social network to complement user-item interaction data for recommendation task, aiming to mitigate the data sparsity issue in recommender systems. However, existing social recommendation methods encounter the following challenge: both social network and interaction data contain substaintial noise, and the propagation of such noise through Graph Neural Networks (GNN… ▽ More Social recommendation leverages social network to complement user-item interaction data for recommendation task, aiming to mitigate the data sparsity issue in recommender systems. However, existing social recommendation methods encounter the following challenge: both social network and interaction data contain substaintial noise, and the propagation of such noise through Graph Neural Networks (GNNs) not only fails to enhance recommendation performance but may also interfere with the model's normal training. Despite the importance of denoising for social network and interaction data, only a limited number of studies have considered the denoising for social network and all of them overlook that for interaction data, hindering the denoising effect and recommendation performance. Based on this, we propose a novel model called Dual-domain Collaborative Denoising for Social Recommendation ($\textbf{DCDSR}$). DCDSR comprises two primary modules: the structure-level collaborative denoising module and the embedding-space collaborative denoising module. In the structure-level collaborative denoising module, information from interaction domain is first employed to guide social network denoising. Subsequently, the denoised social network is used to supervise the denoising for interaction data. The embedding-space collaborative denoising module devotes to resisting the noise cross-domain diffusion problem through contrastive learning with dual-domain embedding collaborative perturbation. Additionally, a novel contrastive learning strategy, named Anchor-InfoNCE, is introduced to better harness the denoising capability of contrastive learning. Evaluating our model on three real-world datasets verifies that DCDSR has a considerable denoising effect, thus outperforms the state-of-the-art social recommendation methods. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 14 pages, 9 figures

arXiv:2405.03167 [pdf, other]

TF4CTR: Twin Focus Framework for CTR Prediction via Adaptive Sample Differentiation

Authors: Honghao Li, Yiwen Zhang, Yi Zhang, Lei Sang, Yun Yang

Abstract: Effective feature interaction modeling is critical for enhancing the accuracy of click-through rate (CTR) prediction in industrial recommender systems. Most of the current deep CTR models resort to building complex network architectures to better capture intricate feature interactions or user behaviors. However, we identify two limitations in these models: (1) the samples given to the model are un… ▽ More Effective feature interaction modeling is critical for enhancing the accuracy of click-through rate (CTR) prediction in industrial recommender systems. Most of the current deep CTR models resort to building complex network architectures to better capture intricate feature interactions or user behaviors. However, we identify two limitations in these models: (1) the samples given to the model are undifferentiated, which may lead the model to learn a larger number of easy samples in a single-minded manner while ignoring a smaller number of hard samples, thus reducing the model's generalization ability; (2) differentiated feature interaction encoders are designed to capture different interactions information but receive consistent supervision signals, thereby limiting the effectiveness of the encoder. To bridge the identified gaps, this paper introduces a novel CTR prediction framework by integrating the plug-and-play Twin Focus (TF) Loss, Sample Selection Embedding Module (SSEM), and Dynamic Fusion Module (DFM), named the Twin Focus Framework for CTR (TF4CTR). Specifically, the framework employs the SSEM at the bottom of the model to differentiate between samples, thereby assigning a more suitable encoder for each sample. Meanwhile, the TF Loss provides tailored supervision signals to both simple and complex encoders. Moreover, the DFM dynamically fuses the feature interaction information captured by the encoders, resulting in more accurate predictions. Experiments on five real-world datasets confirm the effectiveness and compatibility of the framework, demonstrating its capacity to enhance various representative baselines in a model-agnostic manner. To facilitate reproducible research, our open-sourced code and detailed running logs will be made available at: https://github.com/salmon1802/TF4CTR. △ Less

Submitted 25 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.02810 [pdf, other]

Generative-Enhanced Heterogeneous Graph Contrastive Learning

Authors: Yu Wang, Lei Sang, Yi Zhang, Yiwen Zhang

Abstract: Heterogeneous Graphs (HGs) can effectively model complex relationships in the real world by multi-type nodes and edges. In recent years, inspired by self-supervised learning, contrastive Heterogeneous Graphs Neural Networks (HGNNs) have shown great potential by utilizing data augmentation and contrastive discriminators for downstream tasks. However, data augmentation is still limited due to the gr… ▽ More Heterogeneous Graphs (HGs) can effectively model complex relationships in the real world by multi-type nodes and edges. In recent years, inspired by self-supervised learning, contrastive Heterogeneous Graphs Neural Networks (HGNNs) have shown great potential by utilizing data augmentation and contrastive discriminators for downstream tasks. However, data augmentation is still limited due to the graph data's integrity. Furthermore, the contrastive discriminators remain sampling bias and lack local heterogeneous information. To tackle the above limitations, we propose a novel Generative-Enhanced Heterogeneous Graph Contrastive Learning (GHGCL). Specifically, we first propose a heterogeneous graph generative learning enhanced contrastive paradigm. This paradigm includes: 1) A contrastive view augmentation strategy by using a masked autoencoder. 2) Position-aware and semantics-aware positive sample sampling strategy for generating hard negative samples. 3) A hierarchical contrastive learning strategy for capturing local and global information. Furthermore, the hierarchical contrastive learning and sampling strategies aim to constitute an enhanced contrastive discriminator under the generative-contrastive perspective. Finally, we compare our model with seventeen baselines on eight real-world datasets. Our model outperforms the latest contrastive and generative baselines on node classification and link prediction tasks. To reproduce our work, we have open-sourced our code at https://anonymous.4open.science/r/GC-HGNN-E50C. △ Less

Submitted 7 May, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: 11 pages, 8figures

arXiv:2403.03600 [pdf, other]

A Privacy-Preserving Framework with Multi-Modal Data for Cross-Domain Recommendation

Authors: Li Wang, Lei Sang, Quangui Zhang, Qiang Wu, Min Xu

Abstract: Cross-domain recommendation (CDR) aims to enhance recommendation accuracy in a target domain with sparse data by leveraging rich information in a source domain, thereby addressing the data-sparsity problem. Some existing CDR methods highlight the advantages of extracting domain-common and domain-specific features to learn comprehensive user and item representations. However, these methods can't ef… ▽ More Cross-domain recommendation (CDR) aims to enhance recommendation accuracy in a target domain with sparse data by leveraging rich information in a source domain, thereby addressing the data-sparsity problem. Some existing CDR methods highlight the advantages of extracting domain-common and domain-specific features to learn comprehensive user and item representations. However, these methods can't effectively disentangle these components as they often rely on simple user-item historical interaction information (such as ratings, clicks, and browsing), neglecting the rich multi-modal features. Additionally, they don't protect user-sensitive data from potential leakage during knowledge transfer between domains. To address these challenges, we propose a Privacy-Preserving Framework with Multi-Modal Data for Cross-Domain Recommendation, called P2M2-CDR. Specifically, we first design a multi-modal disentangled encoder that utilizes multi-modal information to disentangle more informative domain-common and domain-specific embeddings. Furthermore, we introduce a privacy-preserving decoder to mitigate user privacy leakage during knowledge transfer. Local differential privacy (LDP) is utilized to obfuscate the disentangled embeddings before inter-domain exchange, thereby enhancing privacy protection. To ensure both consistency and differentiation among these obfuscated disentangled embeddings, we incorporate contrastive learning-based domain-inter and domain-intra losses. Extensive Experiments conducted on four real-world datasets demonstrate that P2M2-CDR outperforms other state-of-the-art single-domain and cross-domain baselines. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2312.12750 [pdf, other]

Parallel Ranking of Ads and Creatives in Real-Time Advertising Systems

Authors: Zhiguang Yang, Lu Wang, Chun Gan, Liufang Sang, Haoran Wang, Wenlong Chen, Jie He, Chang** Peng, Zhangang Lin, **g** Shao

Abstract: "Creativity is the heart and soul of advertising services". Effective creatives can create a win-win scenario: advertisers can reach target users and achieve marketing objectives more effectively, users can more quickly find products of interest, and platforms can generate more advertising revenue. With the advent of AI-Generated Content, advertisers now can produce vast amounts of creative conten… ▽ More "Creativity is the heart and soul of advertising services". Effective creatives can create a win-win scenario: advertisers can reach target users and achieve marketing objectives more effectively, users can more quickly find products of interest, and platforms can generate more advertising revenue. With the advent of AI-Generated Content, advertisers now can produce vast amounts of creative content at a minimal cost. The current challenge lies in how advertising systems can select the most pertinent creative in real-time for each user personally. Existing methods typically perform serial ranking of ads or creatives, limiting the creative module in terms of both effectiveness and efficiency. In this paper, we propose for the first time a novel architecture for online parallel estimation of ads and creatives ranking, as well as the corresponding offline joint optimization model. The online architecture enables sophisticated personalized creative modeling while reducing overall latency. The offline joint model for CTR estimation allows mutual awareness and collaborative optimization between ads and creatives. Additionally, we optimize the offline evaluation metrics for the implicit feedback sorting task involved in ad creative ranking. We conduct extensive experiments to compare ours with two state-of-the-art approaches. The results demonstrate the effectiveness of our approach in both offline evaluations and real-world advertising platforms online in terms of response time, CTR, and CPM. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 9 pages, 4 figures, AAAI2024

arXiv:2312.09715 [pdf, other]

CETN: Contrast-enhanced Through Network for CTR Prediction

Authors: Honghao Li, Lei Sang, Yi Zhang, Xuyun Zhang, Yiwen Zhang

Abstract: Click-through rate (CTR) Prediction is a crucial task in personalized information retrievals, such as industrial recommender systems, online advertising, and web search. Most existing CTR Prediction models utilize explicit feature interactions to overcome the performance bottleneck of implicit feature interactions. Hence, deep CTR models based on parallel structures (e.g., DCN, FinalMLP, xDeepFM)… ▽ More Click-through rate (CTR) Prediction is a crucial task in personalized information retrievals, such as industrial recommender systems, online advertising, and web search. Most existing CTR Prediction models utilize explicit feature interactions to overcome the performance bottleneck of implicit feature interactions. Hence, deep CTR models based on parallel structures (e.g., DCN, FinalMLP, xDeepFM) have been proposed to obtain joint information from different semantic spaces. However, these parallel subcomponents lack effective supervisory signals, making it challenging to efficiently capture valuable multi-views feature interaction information in different semantic spaces. To address this issue, we propose a simple yet effective novel CTR model: Contrast-enhanced Through Network for CTR (CETN), so as to ensure the diversity and homogeneity of feature interaction information. Specifically, CETN employs product-based feature interactions and the augmentation (perturbation) concept from contrastive learning to segment different semantic spaces, each with distinct activation functions. This improves diversity in the feature interaction information captured by the model. Additionally, we introduce self-supervised signals and through connection within each semantic space to ensure the homogeneity of the captured feature interaction information. The experiments and research conducted on four real datasets demonstrate that our model consistently outperforms twenty baseline models in terms of AUC and Logloss. △ Less

Submitted 16 February, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

arXiv:2311.17810 [pdf, other]

Coloring the Past: Neural Historical Buildings Reconstruction from Archival Photography

Authors: David Komorowicz, Lu Sang, Ferdinand Maiwald, Daniel Cremers

Abstract: Historical buildings are a treasure and milestone of human cultural heritage. Reconstructing the 3D models of these building hold significant value. The rapid development of neural rendering methods makes it possible to recover the 3D shape only based on archival photographs. However, this task presents considerable challenges due to the limitations of such datasets. Historical photographs are oft… ▽ More Historical buildings are a treasure and milestone of human cultural heritage. Reconstructing the 3D models of these building hold significant value. The rapid development of neural rendering methods makes it possible to recover the 3D shape only based on archival photographs. However, this task presents considerable challenges due to the limitations of such datasets. Historical photographs are often limited in number and the scenes in these photos might have altered over time. The radiometric quality of these images is also often sub-optimal. To address these challenges, we introduce an approach to reconstruct the geometry of historical buildings, employing volumetric rendering techniques. We leverage dense point clouds as a geometric prior and introduce a color appearance embedding loss to recover the color of the building given limited available color images. We aim for our work to spark increased interest and focus on preserving historical buildings. Thus, we also introduce a new historical dataset of the Hungarian National Theater, providing a new benchmark for the reconstruction method. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.17634 [pdf, other]

Erasing the Ephemeral: Joint Camera Refinement and Transient Object Removal for Street View Synthesis

Authors: Mreenav Shyam Deka, Lu Sang, Daniel Cremers

Abstract: Synthesizing novel views for urban environments is crucial for tasks like autonomous driving and virtual tours. Compared to object-level or indoor situations, outdoor settings present unique challenges, such as inconsistency across frames due to moving vehicles and camera pose drift over lengthy sequences. In this paper, we introduce a method that tackles these challenges on view synthesis for out… ▽ More Synthesizing novel views for urban environments is crucial for tasks like autonomous driving and virtual tours. Compared to object-level or indoor situations, outdoor settings present unique challenges, such as inconsistency across frames due to moving vehicles and camera pose drift over lengthy sequences. In this paper, we introduce a method that tackles these challenges on view synthesis for outdoor scenarios. We employ a neural point light field scene representation and strategically detect and mask out dynamic objects to reconstruct novel scenes without artifacts. Moreover, we simultaneously optimize camera pose along with the view synthesis process, and thus, we simultaneously refine both elements. Through validation on real-world urban datasets, we demonstrate state-of-the-art results in synthesizing novel views of urban scenes. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.15594 [pdf]

doi 10.1109/TSTE.2024.3355123

Networked Multiagent Safe Reinforcement Learning for Low-carbon Demand Management in Distribution Network

Authors: Jichen Zhang, Linwei Sang, Yinliang Xu, Hongbin Sun

Abstract: This paper proposes a multiagent based bi-level operation framework for the low-carbon demand management in distribution networks considering the carbon emission allowance on the demand side. In the upper level, the aggregate load agents optimize the control signals for various types of loads to maximize the profits; in the lower level, the distribution network operator makes optimal dispatching d… ▽ More This paper proposes a multiagent based bi-level operation framework for the low-carbon demand management in distribution networks considering the carbon emission allowance on the demand side. In the upper level, the aggregate load agents optimize the control signals for various types of loads to maximize the profits; in the lower level, the distribution network operator makes optimal dispatching decisions to minimize the operational costs and calculates the distribution locational marginal price and carbon intensity. The distributed flexible load agent has only incomplete information of the distribution network and cooperates with other agents using networked communication. Finally, the problem is formulated into a networked multi-agent constrained Markov decision process, which is solved using a safe reinforcement learning algorithm called consensus multi-agent constrained policy optimization considering the carbon emission allowance for each agent. Case studies with the IEEE 33-bus and 123-bus distribution network systems demonstrate the effectiveness of the proposed approach, in terms of satisfying the carbon emission constraint on demand side, ensuring the safe operation of the distribution network and preserving privacy of both sides. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Submitted to IEEE Transactions on Sustainable Energy

arXiv:2306.02099 [pdf, other]

Enhancing Surface Neural Implicits with Curvature-Guided Sampling and Uncertainty-Augmented Representations

Authors: Lu Sang, Abhishek Saroha, Maolin Gao, Daniel Cremers

Abstract: Neural implicits have become popular for representing surfaces because they offer an adaptive resolution and support arbitrary topologies. While previous works rely on ground truth point clouds, they often ignore the effect of input quality and sampling methods during reconstructing process. In this paper, we introduce a sampling method with an uncertainty-augmented surface implicit representation… ▽ More Neural implicits have become popular for representing surfaces because they offer an adaptive resolution and support arbitrary topologies. While previous works rely on ground truth point clouds, they often ignore the effect of input quality and sampling methods during reconstructing process. In this paper, we introduce a sampling method with an uncertainty-augmented surface implicit representation that employs a sampling technique that considers the geometric characteristics of inputs. To this end, we introduce a strategy that efficiently computes differentiable geometric features, namely, mean curvatures, to augment the sampling phase during the training period. The uncertainty augmentation offers insights into the occupancy and reliability of the output signed distance value, thereby expanding representation capabilities into open surfaces. Finally, we demonstrate that our method leads to state-of-the-art reconstructions on both synthetic and real-world data. △ Less

Submitted 12 December, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

Comments: 8 pages

arXiv:2305.00362 [pdf, other]

Electricity Price Prediction for Energy Storage System Arbitrage: A Decision-focused Approach

Authors: Linwei Sang, Yinliang Xu, Huan Long, Qinran Hu, Hongbin Sun

Abstract: Electricity price prediction plays a vital role in energy storage system (ESS) management. Current prediction models focus on reducing prediction errors but overlook their impact on downstream decision-making. So this paper proposes a decision-focused electricity price prediction approach for ESS arbitrage to bridge the gap from the downstream optimization model to the prediction model. The decisi… ▽ More Electricity price prediction plays a vital role in energy storage system (ESS) management. Current prediction models focus on reducing prediction errors but overlook their impact on downstream decision-making. So this paper proposes a decision-focused electricity price prediction approach for ESS arbitrage to bridge the gap from the downstream optimization model to the prediction model. The decision-focused approach aims at utilizing the downstream arbitrage model for training prediction models. It measures the difference between actual decisions under the predicted price and oracle decisions under the true price, i.e., decision error, by regret, transforms it into the tractable surrogate regret, and then derives the gradients to predicted price for training prediction models. Based on the prediction and decision errors, this paper proposes the hybrid loss and corresponding stochastic gradient descent learning method to learn prediction models for prediction and decision accuracy. The case study verifies that the proposed approach can efficiently bring more economic benefits and reduce decision errors by flattening the time distribution of prediction errors, compared to prediction models for only minimizing prediction errors. △ Less

Submitted 29 April, 2023; originally announced May 2023.

arXiv:2302.07706 [pdf, other]

doi 10.1109/JISPIN.2023.3337055

Bidirectional UWB Localization: A Review on an Elastic Positioning Scheme for GNSS-deprived Zones

Authors: Cung Lian Sang, Michael Adams, Marc Hesse, Ulrich Rückert

Abstract: A bidirectional Ultra-Wideband (UWB) localization scheme is one of the three widely adopted design integration processes commonly used in time-based UWB positioning systems. The key property of bidirectional UWB localization is its ability to serve both navigation and tracking tasks within a single localization scheme on demand. Traditionally, navigation and tracking in wireless localization syste… ▽ More A bidirectional Ultra-Wideband (UWB) localization scheme is one of the three widely adopted design integration processes commonly used in time-based UWB positioning systems. The key property of bidirectional UWB localization is its ability to serve both navigation and tracking tasks within a single localization scheme on demand. Traditionally, navigation and tracking in wireless localization systems were treated as separate entities due to distinct applicable use-cases and methodological needs in each implementation process. Therefore, the ability to flexibly or elastically combine two unique positioning perspectives (navigation and tracking) within a single scheme can be regarded as a paradigm shift in the way location-based services are conventionally observed. This article reviews the mentioned bidirectional UWB localization from the perspective of a flexible and versatile positioning topology and highlights its potential in the field. In this regard, the article comprehensively describes the complete system model of the bidirectional UWB localization scheme using modular processes. It also discusses the demonstrative evaluation of two system integration processes and conducts a SWOT (Strengths, Weaknesses, Opportunities, and Threats) analysis of the scheme. Furthermore, the prospect of the presented bidirectional localization scheme for achieving precise location estimation in 5G/6G wireless mobile networks, as well as in Wi-Fi fine-time measurement-based positioning systems was briefly discussed. △ Less

Submitted 21 December, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

Comments: 19 pages, 12 figures

Report number: http://nbn-resolving.de/urn:nbn:de:0070-pub-29690329 ACM Class: C.2.2; B.4.4; D.2.2; A.1; F.2.0

Journal ref: IEEE Journal of Indoor and Seamless Positioning and Navigation, 2023, vol. 1, pp. 161 - 179

arXiv:2210.12202 [pdf, other]

High-Quality RGB-D Reconstruction via Multi-View Uncalibrated Photometric Stereo and Gradient-SDF

Authors: Lu Sang, Bjoern Haefner, Xingxing Zuo, Daniel Cremers

Abstract: Fine-detailed reconstructions are in high demand in many applications. However, most of the existing RGB-D reconstruction methods rely on pre-calculated accurate camera poses to recover the detailed surface geometry, where the representation of a surface needs to be adapted when optimizing different quantities. In this paper, we present a novel multi-view RGB-D based reconstruction method that tac… ▽ More Fine-detailed reconstructions are in high demand in many applications. However, most of the existing RGB-D reconstruction methods rely on pre-calculated accurate camera poses to recover the detailed surface geometry, where the representation of a surface needs to be adapted when optimizing different quantities. In this paper, we present a novel multi-view RGB-D based reconstruction method that tackles camera pose, lighting, albedo, and surface normal estimation via the utilization of a gradient signed distance field (gradient-SDF). The proposed method formulates the image rendering process using specific physically-based model(s) and optimizes the surface's quantities on the actual surface using its volumetric representation, as opposed to other works which estimate surface quantities only near the actual surface. To validate our method, we investigate two physically-based image formation models for natural light and point light source applications. The experimental results on synthetic and real-world datasets demonstrate that the proposed method can recover high-quality geometry of the surface more faithfully than the state-of-the-art and further improves the accuracy of estimated camera poses. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: WACV 2023

arXiv:2111.13652 [pdf, other]

Gradient-SDF: A Semi-Implicit Surface Representation for 3D Reconstruction

Authors: Christiane Sommer, Lu Sang, David Schubert, Daniel Cremers

Abstract: We present Gradient-SDF, a novel representation for 3D geometry that combines the advantages of implict and explicit representations. By storing at every voxel both the signed distance field as well as its gradient vector field, we enhance the capability of implicit representations with approaches originally formulated for explicit surfaces. As concrete examples, we show that (1) the Gradient-SDF… ▽ More We present Gradient-SDF, a novel representation for 3D geometry that combines the advantages of implict and explicit representations. By storing at every voxel both the signed distance field as well as its gradient vector field, we enhance the capability of implicit representations with approaches originally formulated for explicit surfaces. As concrete examples, we show that (1) the Gradient-SDF allows us to perform direct SDF tracking from depth images, using efficient storage schemes like hash maps, and that (2) the Gradient-SDF representation enables us to perform photometric bundle adjustment directly in a voxel representation (without transforming into a point cloud or mesh), naturally a fully implicit optimization of geometry and camera poses and easy geometry upsampling. Experimental results confirm that this leads to significantly sharper reconstructions. Since the overall SDF voxel structure is still respected, the proposed Gradient-SDF is equally suited for (GPU) parallelization as related approaches. △ Less

Submitted 26 November, 2021; originally announced November 2021.

Comments: First two authors contributed equally

arXiv:2109.01461 [pdf, other]

Dive into Layers: Neural Network Capacity Bounding using Algebraic Geometry

Authors: Ji Yang, Lu Sang, Daniel Cremers

Abstract: The empirical results suggest that the learnability of a neural network is directly related to its size. To mathematically prove this, we borrow a tool in topological algebra: Betti numbers to measure the topological geometric complexity of input data and the neural network. By characterizing the expressive capacity of a neural network with its topological complexity, we conduct a thorough analysi… ▽ More The empirical results suggest that the learnability of a neural network is directly related to its size. To mathematically prove this, we borrow a tool in topological algebra: Betti numbers to measure the topological geometric complexity of input data and the neural network. By characterizing the expressive capacity of a neural network with its topological complexity, we conduct a thorough analysis and show that the network's expressive capacity is limited by the scale of its layers. Further, we derive the upper bounds of the Betti numbers on each layer within the network. As a result, the problem of architecture selection of a neural network is transformed to determining the scale of the network that can represent the input data complexity. With the presented results, the architecture selection of a fully connected network boils down to choosing a suitable size of the network such that it equips the Betti numbers that are not smaller than the Betti numbers of the input data. We perform the experiments on a real-world dataset MNIST and the results verify our analysis and conclusion. The code is publicly available. △ Less

Submitted 4 November, 2021; v1 submitted 3 September, 2021; originally announced September 2021.

arXiv:1912.06501 [pdf, other]

Inferring Super-Resolution Depth from a Moving Light-Source Enhanced RGB-D Sensor: A Variational Approach

Authors: Lu Sang, Bjoern Haefner, Daniel Cremers

Abstract: A novel approach towards depth map super-resolution using multi-view uncalibrated photometric stereo is presented. Practically, an LED light source is attached to a commodity RGB-D sensor and is used to capture objects from multiple viewpoints with unknown motion. This non-static camera-to-object setup is described with a nonconvex variational approach such that no calibration on lighting or camer… ▽ More A novel approach towards depth map super-resolution using multi-view uncalibrated photometric stereo is presented. Practically, an LED light source is attached to a commodity RGB-D sensor and is used to capture objects from multiple viewpoints with unknown motion. This non-static camera-to-object setup is described with a nonconvex variational approach such that no calibration on lighting or camera motion is required due to the formulation of an end-to-end joint optimization problem. Solving the proposed variational model results in high resolution depth, reflectance and camera pose estimates, as we show on challenging synthetic and real-world datasets. △ Less

Submitted 13 December, 2019; originally announced December 2019.

Comments: WACV2020 conference paper

arXiv:1803.09080 [pdf, other]

AAANE: Attention-based Adversarial Autoencoder for Multi-scale Network Embedding

Authors: Lei Sang, Min Xu, Shengsheng Qian, Xindong Wu

Abstract: Network embedding represents nodes in a continuous vector space and preserves structure information from the Network. Existing methods usually adopt a "one-size-fits-all" approach when concerning multi-scale structure information, such as first- and second-order proximity of nodes, ignoring the fact that different scales play different roles in the embedding learning. In this paper, we propose an… ▽ More Network embedding represents nodes in a continuous vector space and preserves structure information from the Network. Existing methods usually adopt a "one-size-fits-all" approach when concerning multi-scale structure information, such as first- and second-order proximity of nodes, ignoring the fact that different scales play different roles in the embedding learning. In this paper, we propose an Attention-based Adversarial Autoencoder Network Embedding(AAANE) framework, which promotes the collaboration of different scales and lets them vote for robust representations. The proposed AAANE consists of two components: 1) Attention-based autoencoder effectively capture the highly non-linear network structure, which can de-emphasize irrelevant scales during training. 2) An adversarial regularization guides the autoencoder learn robust representations by matching the posterior distribution of the latent embeddings to given prior distribution. This is the first attempt to introduce attention mechanisms to multi-scale network embedding. Experimental results on real-world networks show that our learned attention parameters are different for every network and the proposed approach outperforms existing state-of-the-art approaches for network embedding. △ Less

Submitted 24 March, 2018; originally announced March 2018.

Comments: 8 pages, 5 figures

Showing 1–18 of 18 results for author: Sang, L