-
Robotic Table Tennis: A Case Study into a High Speed Learning System
Authors:
David B. D'Ambrosio,
Jonathan Abelian,
Saminda Abeyruwan,
Michael Ahn,
Alex Bewley,
Justin Boyd,
Krzysztof Choromanski,
Omar Cortes,
Erwin Coumans,
Tianli Ding,
Wenbo Gao,
Laura Graesser,
Atil Iscen,
Navdeep Jaitly,
Deepali Jain,
Juhana Kangaspunta,
Satoshi Kataoka,
Gus Kouretas,
Yuheng Kuang,
Nevena Lazic,
Corey Lynch,
Reza Mahjourian,
Sherry Q. Moore,
Thinh Nguyen,
Ken Oslund
, et al. (10 additional authors not shown)
Abstract:
We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real w…
▽ More
We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description, including numerous design decisions that are typically not widely disseminated, with a collection of studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, sensitivity to policy hyper-parameters, and choice of action space. A video demonstrating the components of the system and details of experimental results can be found at https://youtu.be/uFcnWjB42I0.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Multi-Agent Search for a Moving and Camouflaging Target
Authors:
Miguel Lejeune,
Johannes O. Royset,
Wenbo Ma
Abstract:
In multi-agent search planning for a randomly moving and camouflaging target, we examine heterogeneous searchers that differ in terms of their endurance level, travel speed, and detection ability. This leads to a convex mixed-integer nonlinear program, which we reformulate using three linearization techniques. We develop preprocessing steps, outer approximations via lazy constraints, and bundle-ba…
▽ More
In multi-agent search planning for a randomly moving and camouflaging target, we examine heterogeneous searchers that differ in terms of their endurance level, travel speed, and detection ability. This leads to a convex mixed-integer nonlinear program, which we reformulate using three linearization techniques. We develop preprocessing steps, outer approximations via lazy constraints, and bundle-based cutting plane methods to address large-scale instances. Further specializations emerge when the target moves according to a Markov chain. We carry out an extensive numerical study to show the computational efficiency of our methods and to derive insights regarding which approach should be favored for which type of problem instance.
△ Less
Submitted 1 November, 2023; v1 submitted 5 September, 2023;
originally announced September 2023.
-
CPFES: Physical Fitness Evaluation Based on Canadian Agility and Movement Skill Assessment
Authors:
Pengcheng Dong,
Xiao** Mao,
Lixia Fan,
Wenbo Wan,
Jiande Sun
Abstract:
In recent years, the assessment of fundamental movement skills integrated with physical education has focused on both teaching practice and the feasibility of assessment. The object of assessment has shifted from multiple ages to subdivided ages, while the content of assessment has changed from complex and time-consuming to concise and efficient. Therefore, we apply deep learning to physical fitne…
▽ More
In recent years, the assessment of fundamental movement skills integrated with physical education has focused on both teaching practice and the feasibility of assessment. The object of assessment has shifted from multiple ages to subdivided ages, while the content of assessment has changed from complex and time-consuming to concise and efficient. Therefore, we apply deep learning to physical fitness evaluation, we propose a system based on the Canadian Agility and Movement Skill Assessment (CAMSA) Physical Fitness Evaluation System (CPFES), which evaluates children's physical fitness based on CAMSA, and gives recommendations based on the scores obtained by CPFES to help children grow. We have designed a landmark detection module and a pose estimation module, and we have also designed a pose evaluation module for the CAMSA criteria that can effectively evaluate the actions of the child being tested. Our experimental results demonstrate the high accuracy of the proposed system.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
WSTac: Interactive Surface Perception based on Whisker-Inspired and Self-Illuminated Vision-Based Tactile Sensor
Authors:
Kai Chong Lei,
Kit Wa Sou,
Wang Sing Chan,
Jiayi Yan,
Siqi **,
Dengfeng Peng,
Wenbo Ding,
Xiao-** Zhang
Abstract:
Modern Visual-Based Tactile Sensors (VBTSs) use cost-effective cameras to track elastomer deformation, but struggle with ambient light interference. Solutions typically involve using internal LEDs and blocking external light, thus adding complexity. Creating a VBTS resistant to ambient light with just a camera and an elastomer remains a challenge. In this work, we introduce WStac, a self-illuminat…
▽ More
Modern Visual-Based Tactile Sensors (VBTSs) use cost-effective cameras to track elastomer deformation, but struggle with ambient light interference. Solutions typically involve using internal LEDs and blocking external light, thus adding complexity. Creating a VBTS resistant to ambient light with just a camera and an elastomer remains a challenge. In this work, we introduce WStac, a self-illuminating VBTS comprising a mechanoluminescence (ML) whisker elastomer, camera, and 3D printed parts. The ML whisker elastomer, inspired by the touch sensitivity of vibrissae, offers both light isolation and high ML intensity under stress, thereby removing the necessity for additional LED modules. With the incorporation of machine learning, the sensor effectively utilizes the dynamic contact variations of 25 whiskers to successfully perform tasks like speed regression, directional identification, and texture classification. Videos are available at: https://sites.google.com/view/wstac/.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Exploring Transferability of Multimodal Adversarial Samples for Vision-Language Pre-training Models with Contrastive Learning
Authors:
Youze Wang,
Wenbo Hu,
Yinpeng Dong,
Hanwang Zhang,
Richang Hong
Abstract:
Vision-language pre-training models (VLP) are vulnerable, especially to multimodal adversarial samples, which can be crafted by adding imperceptible perturbations on both original images and texts. However, under the black-box setting, there have been no works to explore the transferability of multimodal adversarial attacks against the VLP models. In this work, we take CLIP as the surrogate model…
▽ More
Vision-language pre-training models (VLP) are vulnerable, especially to multimodal adversarial samples, which can be crafted by adding imperceptible perturbations on both original images and texts. However, under the black-box setting, there have been no works to explore the transferability of multimodal adversarial attacks against the VLP models. In this work, we take CLIP as the surrogate model and propose a gradient-based multimodal attack method to generate transferable adversarial samples against the VLP models. By applying the gradient to optimize the adversarial images and adversarial texts simultaneously, our method can better search for and attack the vulnerable images and text information pairs. To improve the transferability of the attack, we utilize contrastive learning including image-text contrastive learning and intra-modal contrastive learning to have a more generalized understanding of the underlying data distribution and mitigate the overfitting of the surrogate model so that the generated multimodal adversarial samples have a higher transferability for VLP models. Extensive experiments validate the effectiveness of the proposed method.
△ Less
Submitted 4 November, 2023; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Masked Cross-image Encoding for Few-shot Segmentation
Authors:
Wenbo Xu,
Huaxi Huang,
Ming Cheng,
Litao Yu,
Qiang Wu,
Jian Zhang
Abstract:
Few-shot segmentation (FSS) is a dense prediction task that aims to infer the pixel-wise labels of unseen classes using only a limited number of annotated images. The key challenge in FSS is to classify the labels of query pixels using class prototypes learned from the few labeled support exemplars. Prior approaches to FSS have typically focused on learning class-wise descriptors independently fro…
▽ More
Few-shot segmentation (FSS) is a dense prediction task that aims to infer the pixel-wise labels of unseen classes using only a limited number of annotated images. The key challenge in FSS is to classify the labels of query pixels using class prototypes learned from the few labeled support exemplars. Prior approaches to FSS have typically focused on learning class-wise descriptors independently from support images, thereby ignoring the rich contextual information and mutual dependencies among support-query features. To address this limitation, we propose a joint learning method termed Masked Cross-Image Encoding (MCE), which is designed to capture common visual properties that describe object details and to learn bidirectional inter-image dependencies that enhance feature interaction. MCE is more than a visual representation enrichment module; it also considers cross-image mutual dependencies and implicit guidance. Experiments on FSS benchmarks PASCAL-$5^i$ and COCO-$20^i$ demonstrate the advanced meta-learning ability of the proposed method.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Texture Generation on 3D Meshes with Point-UV Diffusion
Authors:
Xin Yu,
Peng Dai,
Wenbo Li,
Lan Ma,
Zhengzhe Liu,
Xiaojuan Qi
Abstract:
In this work, we focus on synthesizing high-quality textures on 3D meshes. We present Point-UV diffusion, a coarse-to-fine pipeline that marries the denoising diffusion model with UV map** to generate 3D consistent and high-quality texture images in UV space. We start with introducing a point diffusion model to synthesize low-frequency texture components with our tailored style guidance to tackl…
▽ More
In this work, we focus on synthesizing high-quality textures on 3D meshes. We present Point-UV diffusion, a coarse-to-fine pipeline that marries the denoising diffusion model with UV map** to generate 3D consistent and high-quality texture images in UV space. We start with introducing a point diffusion model to synthesize low-frequency texture components with our tailored style guidance to tackle the biased color distribution. The derived coarse texture offers global consistency and serves as a condition for the subsequent UV diffusion stage, aiding in regularizing the model to generate a 3D consistent UV texture image. Then, a UV diffusion model with hybrid conditions is developed to enhance the texture fidelity in the 2D UV space. Our method can process meshes of any genus, generating diversified, geometry-compatible, and high-fidelity textures. Code is available at https://cvmi-lab.github.io/Point-UV-Diffusion
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Authors:
Wenbo Hu,
Yifan Xu,
Yi Li,
Weiyue Li,
Zeyuan Chen,
Zhuowen Tu
Abstract:
Vision Language Models (VLMs), which extend Large Language Models (LLM) by incorporating visual understanding capability, have demonstrated significant advancements in addressing open-ended visual question-answering (VQA) tasks. However, these models cannot accurately interpret images infused with text, a common occurrence in real-world scenarios. Standard procedures for extracting information fro…
▽ More
Vision Language Models (VLMs), which extend Large Language Models (LLM) by incorporating visual understanding capability, have demonstrated significant advancements in addressing open-ended visual question-answering (VQA) tasks. However, these models cannot accurately interpret images infused with text, a common occurrence in real-world scenarios. Standard procedures for extracting information from images often involve learning a fixed set of query embeddings. These embeddings are designed to encapsulate image contexts and are later used as soft prompt inputs in LLMs. Yet, this process is limited to the token count, potentially curtailing the recognition of scenes with text-rich context. To improve upon them, the present study introduces BLIVA: an augmented version of InstructBLIP with Visual Assistant. BLIVA incorporates the query embeddings from InstructBLIP and also directly projects encoded patch embeddings into the LLM, a technique inspired by LLaVA. This approach assists the model to capture intricate details potentially missed during the query decoding process. Empirical evidence demonstrates that our model, BLIVA, significantly enhances performance in processing text-rich VQA benchmarks (up to 17.76% in OCR-VQA benchmark) and in undertaking general (not particularly text-rich) VQA benchmarks (up to 7.9% in Visual Spatial Reasoning benchmark), and achieved 17.72% overall improvement in a comprehensive multimodal LLM benchmark (MME), comparing to our baseline InstructBLIP. BLIVA demonstrates significant capability in decoding real-world images, irrespective of text presence. To demonstrate the broad industry applications enabled by BLIVA, we evaluate the model using a new dataset comprising YouTube thumbnails paired with question-answer sets across 11 diverse categories. Our code and models are freely accessible at https://github.com/mlpc-ucsd/BLIVA.
△ Less
Submitted 17 December, 2023; v1 submitted 19 August, 2023;
originally announced August 2023.
-
TRTM: Template-based Reconstruction and Target-oriented Manipulation of Crumpled Cloths
Authors:
Wenbo Wang,
Gen Li,
Miguel Zamora,
Stelian Coros
Abstract:
Precise reconstruction and manipulation of the crumpled cloths is challenging due to the high dimensionality of cloth models, as well as the limited observation at self-occluded regions. We leverage the recent progress in the field of single-view human reconstruction to template-based reconstruct crumpled cloths from their top-view depth observations only, with our proposed sim-real registration p…
▽ More
Precise reconstruction and manipulation of the crumpled cloths is challenging due to the high dimensionality of cloth models, as well as the limited observation at self-occluded regions. We leverage the recent progress in the field of single-view human reconstruction to template-based reconstruct crumpled cloths from their top-view depth observations only, with our proposed sim-real registration protocols. In contrast to previous implicit cloth representations, our reconstruction mesh explicitly describes the positions and visibilities of the entire cloth mesh vertices, enabling more efficient dual-arm and single-arm target-oriented manipulations. Experiments demonstrate that our TRTM system can be applied to daily cloths that have similar topologies as our template mesh, but with different shapes, sizes, patterns, and physical properties. Videos, datasets, pre-trained models, and code can be downloaded from our project website: https://wenbwa.github.io/TRTM/ .
△ Less
Submitted 15 May, 2024; v1 submitted 8 August, 2023;
originally announced August 2023.
-
Cooperative Multi-Type Multi-Agent Deep Reinforcement Learning for Resource Management in Space-Air-Ground Integrated Networks
Authors:
Hengxi Zhang,
Huaze Tang,
Wenbo Ding,
Xiao-** Zhang
Abstract:
The Space-Air-Ground Integrated Network (SAGIN), integrating heterogeneous devices including low earth orbit (LEO) satellites, unmanned aerial vehicles (UAVs), and ground users (GUs), holds significant promise for advancing smart city applications. However, resource management of the SAGIN is a challenge requiring urgent study in that inappropriate resource management will cause poor data transmis…
▽ More
The Space-Air-Ground Integrated Network (SAGIN), integrating heterogeneous devices including low earth orbit (LEO) satellites, unmanned aerial vehicles (UAVs), and ground users (GUs), holds significant promise for advancing smart city applications. However, resource management of the SAGIN is a challenge requiring urgent study in that inappropriate resource management will cause poor data transmission, and hence affect the services in smart cities. In this paper, we develop a comprehensive SAGIN system that encompasses five distinct communication links and propose an efficient cooperative multi-type multi-agent deep reinforcement learning (CMT-MARL) method to address the resource management issue. The experimental results highlight the efficacy of the proposed CMT-MARL, as evidenced by key performance indicators such as the overall transmission rate and transmission success rate. These results underscore the potential value and feasibility of future implementation of the SAGIN.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Towards Carbon-Free Electricity: A Flow-Based Framework for Power Grid Carbon Accounting and Decarbonization
Authors:
Xin Chen,
Hungpo Chao,
Wenbo Shi,
Na Li
Abstract:
This paper introduces a comprehensive framework aimed at advancing research and policy development in the realm of decarbonization within electric power systems. The framework focuses on three key aspects: carbon accounting, carbon-aware decision-making, and carbon-electricity market design. It addresses existing problems, methods, and proposes solutions. In contrast to traditional pool-based emis…
▽ More
This paper introduces a comprehensive framework aimed at advancing research and policy development in the realm of decarbonization within electric power systems. The framework focuses on three key aspects: carbon accounting, carbon-aware decision-making, and carbon-electricity market design. It addresses existing problems, methods, and proposes solutions. In contrast to traditional pool-based emissions models, our framework proposes a novel flow-based emissions model. This model incorporates the underlying physical power grid and power flows, allowing for accurate carbon accounting at both temporal and spatial scales. This, in turn, facilitates informed decision-making to achieve grid decarbonization goals. The framework is built on a flow-based accounting methodology and utilizes the carbon-aware optimal power flow (C-OPF) technique as a theoretical foundation for decarbonization decision-making. Additionally, the paper explores the potential design of carbon-electricity markets and pricing mechanisms to incentivize decentralized decarbonization actions. The critical issues of data availability, infrastructure development, and considerations of fairness and equity are also discussed. This paper seeks to advance scholarly understanding and foster progress toward achieving sustainable and carbon-free electric power systems.
△ Less
Submitted 28 November, 2023; v1 submitted 6 August, 2023;
originally announced August 2023.
-
Carbon-Aware Optimal Power Flow
Authors:
Xin Chen,
Andy Sun,
Wenbo Shi,
Na Li
Abstract:
To facilitate effective decarbonization of the electric power sector, this paper introduces the generic Carbon-aware Optimal Power Flow (C-OPF) method for power system decision-making that considers demand-side carbon accounting and emission management. Built upon the classic optimal power flow (OPF) model, the C-OPF method incorporates carbon emission flow equations and constraints, as well as ca…
▽ More
To facilitate effective decarbonization of the electric power sector, this paper introduces the generic Carbon-aware Optimal Power Flow (C-OPF) method for power system decision-making that considers demand-side carbon accounting and emission management. Built upon the classic optimal power flow (OPF) model, the C-OPF method incorporates carbon emission flow equations and constraints, as well as carbon-related objectives, to jointly optimize power flow and carbon flow. In particular, this paper establishes the invertibility of the carbon flow matrix and proposes modeling and linearization techniques to address the issues of undetermined power flow directions and bilinear terms in the C-OPF model. Additionally, two novel carbon emission models, together with the carbon accounting schemes, for energy storage systems are developed and integrated into the C-OPF model. Numerical simulations demonstrate the characteristics and effectiveness of the C-OPF method, in comparison with OPF solutions.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
Search for Dark-Matter-Nucleon Interactions with a Dark Mediator in PandaX-4T
Authors:
Di Huang,
Abdusalam Abdukerim,
Zihao Bo,
Wei Chen,
Xun Chen,
Yunhua Chen,
Chen Cheng,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Changbo Fu,
Mengting Fu,
Lisheng Geng,
Karl Giboni,
Linhui Gu,
Xuyuan Guo,
Chencheng Han,
Ke Han,
Changda He,
**rong He,
Yanlin Huang,
Zhou Huang,
Ruquan Hou,
Xiangdong Ji
, et al. (70 additional authors not shown)
Abstract:
We report results of a search for dark-matter-nucleon interactions via a dark mediator using optimized low-energy data from the PandaX-4T liquid xenon experiment. With the ionization-signal-only data and utilizing the Migdal effect, we set the most stringent limits on the cross section for dark matter masses ranging from 30~$\rm{MeV/c^2}$ to 2~$\rm{GeV/c^2}$. Under the assumption that the dark med…
▽ More
We report results of a search for dark-matter-nucleon interactions via a dark mediator using optimized low-energy data from the PandaX-4T liquid xenon experiment. With the ionization-signal-only data and utilizing the Migdal effect, we set the most stringent limits on the cross section for dark matter masses ranging from 30~$\rm{MeV/c^2}$ to 2~$\rm{GeV/c^2}$. Under the assumption that the dark mediator is a dark photon that decays into scalar dark matter pairs in the early Universe, we rule out significant parameter space of such thermal relic dark-matter model.
△ Less
Submitted 18 December, 2023; v1 submitted 3 August, 2023;
originally announced August 2023.
-
AQUILA: Communication Efficient Federated Learning with Adaptive Quantization in Device Selection Strategy
Authors:
Zihao Zhao,
Yuzhu Mao,
Zhenpeng Shi,
Yang Liu,
Tian Lan,
Wenbo Ding,
Xiao-** Zhang
Abstract:
The widespread adoption of Federated Learning (FL), a privacy-preserving distributed learning methodology, has been impeded by the challenge of high communication overheads, typically arising from the transmission of large-scale models. Existing adaptive quantization methods, designed to mitigate these overheads, operate under the impractical assumption of uniform device participation in every tra…
▽ More
The widespread adoption of Federated Learning (FL), a privacy-preserving distributed learning methodology, has been impeded by the challenge of high communication overheads, typically arising from the transmission of large-scale models. Existing adaptive quantization methods, designed to mitigate these overheads, operate under the impractical assumption of uniform device participation in every training round. Additionally, these methods are limited in their adaptability due to the necessity of manual quantization level selection and often overlook biases inherent in local devices' data, thereby affecting the robustness of the global model. In response, this paper introduces AQUILA (adaptive quantization in device selection strategy), a novel adaptive framework devised to effectively handle these issues, enhancing the efficiency and robustness of FL. AQUILA integrates a sophisticated device selection method that prioritizes the quality and usefulness of device updates. Utilizing the exact global model stored by devices, it enables a more precise device selection criterion, reduces model deviation, and limits the need for hyperparameter adjustments. Furthermore, AQUILA presents an innovative quantization criterion, optimized to improve communication efficiency while assuring model convergence. Our experiments demonstrate that AQUILA significantly decreases communication costs compared to existing methods, while maintaining comparable model performance across diverse non-homogeneous FL settings, such as Non-IID data and heterogeneous model architectures.
△ Less
Submitted 4 October, 2023; v1 submitted 31 July, 2023;
originally announced August 2023.
-
BayesDAG: Gradient-Based Posterior Inference for Causal Discovery
Authors:
Yashas Annadani,
Nick Pawlowski,
Joel Jennings,
Stefan Bauer,
Cheng Zhang,
Wenbo Gong
Abstract:
Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existin…
▽ More
Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existing methods are either limited to variational inference on node permutation matrices for linear causal models, leading to compromised inference accuracy, or continuous relaxation of adjacency matrices constrained by a DAG regularizer, which cannot ensure resulting graphs are DAGs. In this work, we introduce a scalable Bayesian causal discovery framework based on a combination of stochastic gradient Markov Chain Monte Carlo (SG-MCMC) and Variational Inference (VI) that overcomes these limitations. Our approach directly samples DAGs from the posterior without requiring any DAG regularization, simultaneously draws function parameter samples and is applicable to both linear and nonlinear causal models. To enable our approach, we derive a novel equivalence to the permutation-based DAG learning, which opens up possibilities of using any relaxed gradient estimator defined over permutations. To our knowledge, this is the first framework applying gradient-based MCMC sampling for causal discovery. Empirical evaluation on synthetic and real-world datasets demonstrate our approach's effectiveness compared to state-of-the-art baselines.
△ Less
Submitted 8 December, 2023; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields
Authors:
Wenbo Hu,
Yuling Wang,
Lin Ma,
Bangbang Yang,
Lin Gao,
Xiao Liu,
Yuewen Ma
Abstract:
Despite the tremendous progress in neural radiance fields (NeRF), we still face a dilemma of the trade-off between quality and efficiency, e.g., MipNeRF presents fine-detailed and anti-aliased renderings but takes days for training, while Instant-ngp can accomplish the reconstruction in a few minutes but suffers from blurring or aliasing when rendering at various distances or resolutions due to ig…
▽ More
Despite the tremendous progress in neural radiance fields (NeRF), we still face a dilemma of the trade-off between quality and efficiency, e.g., MipNeRF presents fine-detailed and anti-aliased renderings but takes days for training, while Instant-ngp can accomplish the reconstruction in a few minutes but suffers from blurring or aliasing when rendering at various distances or resolutions due to ignoring the sampling area. To this end, we propose a novel Tri-Mip encoding that enables both instant reconstruction and anti-aliased high-fidelity rendering for neural radiance fields. The key is to factorize the pre-filtered 3D feature spaces in three orthogonal mipmaps. In this way, we can efficiently perform 3D area sampling by taking advantage of 2D pre-filtered feature maps, which significantly elevates the rendering quality without sacrificing efficiency. To cope with the novel Tri-Mip representation, we propose a cone-casting rendering technique to efficiently sample anti-aliased 3D features with the Tri-Mip encoding considering both pixel imaging and observing distance. Extensive experiments on both synthetic and real-world datasets demonstrate our method achieves state-of-the-art rendering quality and reconstruction speed while maintaining a compact representation that reduces 25% model size compared against Instant-ngp.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Decoding Taste Information in Human Brain: A Temporal and Spatial Reconstruction Data Augmentation Method Coupled with Taste EEG
Authors:
Xiuxin Xia,
Yuchao Yang,
Yan Shi,
Wenbo Zheng,
Hong Men
Abstract:
For humans, taste is essential for perceiving food's nutrient content or harmful components. The current sensory evaluation of taste mainly relies on artificial sensory evaluation and electronic tongue, but the former has strong subjectivity and poor repeatability, and the latter is not flexible enough. This work proposed a strategy for acquiring and recognizing taste electroencephalogram (EEG), a…
▽ More
For humans, taste is essential for perceiving food's nutrient content or harmful components. The current sensory evaluation of taste mainly relies on artificial sensory evaluation and electronic tongue, but the former has strong subjectivity and poor repeatability, and the latter is not flexible enough. This work proposed a strategy for acquiring and recognizing taste electroencephalogram (EEG), aiming to decode people's objective perception of taste through taste EEG. Firstly, according to the proposed experimental paradigm, the taste EEG of subjects under different taste stimulation was collected. Secondly, to avoid insufficient training of the model due to the small number of taste EEG samples, a Temporal and Spatial Reconstruction Data Augmentation (TSRDA) method was proposed, which effectively augmented the taste EEG by reconstructing the taste EEG's important features in temporal and spatial dimensions. Thirdly, a multi-view channel attention module was introduced into a designed convolutional neural network to extract the important features of the augmented taste EEG. The proposed method has accuracy of 99.56%, F1-score of 99.48%, and kappa of 99.38%, proving the method's ability to distinguish the taste EEG evoked by different taste stimuli successfully. In summary, combining TSRDA with taste EEG technology provides an objective and effective method for sensory evaluation of food taste.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Joint Radio Frequency Fingerprints Identification via Multi-antenna Receiver
Authors:
Xiaofang Chen,
Wenbo Xu,
Yue Wang
Abstract:
In Internet of Things (IoT), radio frequency fingerprints (RFF) technology has been widely used for passive security authentication to identify the special emitter. However, few works took advantage of independent oscillator distortions at the receiver side, and no work has yet considered filtering receiver distortions. In this paper, we investigate the RFF identification (RFFI) involving unknown…
▽ More
In Internet of Things (IoT), radio frequency fingerprints (RFF) technology has been widely used for passive security authentication to identify the special emitter. However, few works took advantage of independent oscillator distortions at the receiver side, and no work has yet considered filtering receiver distortions. In this paper, we investigate the RFF identification (RFFI) involving unknown receiver distortions, where the phase noise caused by each antenna oscillator is independent. Three RFF schemes are proposed according to the number of receiving antennas. When the number is small, the Mutual Information Weighting Scheme (MIWS) is developed by calculating the weighted voting of RFFI result at each antenna; when the number is moderate, the Distortions Filtering Scheme (DFS) is developed by filtering out the channel noise and receiver distortions; when the number is large enough, the Group-Distortions Filtering and Weighting Scheme (GDFWS) is developed, which integrates the advantages of MIWS and DFS. Furthermore, the ability of DFS to filter out the channel noise and receiver distortions is theoretically analyzed at a specific confidence level. Experiments are provided when both channel noise and receiver distortions exist, which verify the effectiveness and robustness of the proposed schemes.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Optimization-based Learning for Dynamic Load Planning in Trucking Service Networks
Authors:
Ritesh Ojha,
Wenbo Chen,
Hanyu Zhang,
Reem Khir,
Alan Erera,
Pascal Van Hentenryck
Abstract:
The load planning problem is a critical challenge in service network design for parcel carriers: it decides how many trailers to assign for dispatch over time between pairs of terminals. Another key challenge is to determine a flow plan, which specifies how parcel volumes are assigned to planned loads. This paper considers the Outbound Load Planning Problem (OLPP) that considers flow and load plan…
▽ More
The load planning problem is a critical challenge in service network design for parcel carriers: it decides how many trailers to assign for dispatch over time between pairs of terminals. Another key challenge is to determine a flow plan, which specifies how parcel volumes are assigned to planned loads. This paper considers the Outbound Load Planning Problem (OLPP) that considers flow and load planning challenges jointly in order to adjust loads and flows as the demand forecast changes over time before the day of operations in a terminal. The paper aims at develo** a decision-support tool to inform planners making these decisions at terminals across the network. The paper formulates the OLPP as a mixed-integer programming model and shows that it admits a large number of symmetries in a network where each commodity can be routed through primary and alternate terminals. As a result, an optimization solver may return fundamentally different solutions to closely related problems, confusing planners and reducing trust in optimization. To remedy this limitation, this paper proposes a lexicographical optimization approach that eliminates those symmetries by generating optimal solutions staying close to a reference plan. Moreover, this paper designs an optimization proxy that addresses the computational challenges of the optimization model. The optimization proxy combines a machine-learning model and a repair procedure to find near-optimal solutions that satisfy real-time constraints imposed by planners in the loop. An extensive computational study on industrial instances shows that the optimization proxy is orders of magnitude faster for generating solutions that are consistent with each other. The proposed approach also demonstrates the benefits of the OLPP for load consolidation and the significant savings obtained from combining machine learning and optimization.
△ Less
Submitted 28 April, 2024; v1 submitted 8 July, 2023;
originally announced July 2023.
-
Optical N-plasmon: Topological hydrodynamic excitations in Graphene from repulsive Hall viscosity
Authors:
Wenbo Sun,
Todd Van Mechelen,
Sathwik Bharadwaj,
Ashwin K. Boddeti,
Zubin Jacob
Abstract:
Edge states occurring in Chern and quantum spin-Hall phases are signatures of the topological electronic band structure in two-dimensional (2D) materials. Recently, a new topological electromagnetic phase of graphene characterized by the optical N-invariant has been proposed. Optical N-invariant arises from repulsive Hall viscosity in hydrodynamic many-body electron systems, fundamentally differen…
▽ More
Edge states occurring in Chern and quantum spin-Hall phases are signatures of the topological electronic band structure in two-dimensional (2D) materials. Recently, a new topological electromagnetic phase of graphene characterized by the optical N-invariant has been proposed. Optical N-invariant arises from repulsive Hall viscosity in hydrodynamic many-body electron systems, fundamentally different from the Chern and Z2 invariants. In this paper, we introduce the topologically protected edge excitation -- optical N-plasmon of interacting many-body electron systems in the topological optical N-phase. These optical N-plasmons are signatures of the topological plasmonic band structure in 2D materials. We demonstrate that optical N-plasmons exhibit fundamentally different dispersion relations, stability, and edge profiles from the topologically trivial edge magneto plasmons. Based on the optical N-plasmon, we design an ultra sub-wavelength broadband topological hydrodynamic circulator, which is a chiral quantum radio-frequency circuit component crucial for information routing and interfacing quantum-classical computing systems. Furthermore, we reveal that optical N-plasmons can be effectively tuned by the neighboring dielectric environment without breaking the topological properties. Our work provides a smoking gun signature of repulsive Hall viscosity and opens practical applications of topological electromagnetic phases of two-dimensional materials.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Optimal binary gratings for multi-wavelength magneto-optical traps
Authors:
Oliver S. Burrow,
Robert J. Fasano,
Wesley Brand,
Michael W. Wright,
Wenbo Li,
Andrew D. Ludlow,
Erling Riis,
Paul F. Griffin,
Aidan S. Arnold
Abstract:
Grating magneto-optical traps are an enabling quantum technology for portable metrological devices with ultracold atoms. However, beam diffraction efficiency and angle are affected by wavelength, creating a single-optic design challenge for laser cooling in two stages at two distinct wavelengths - as commonly used for loading e.g. Sr or Yb atoms into optical lattice or tweezer clocks. Here, we opt…
▽ More
Grating magneto-optical traps are an enabling quantum technology for portable metrological devices with ultracold atoms. However, beam diffraction efficiency and angle are affected by wavelength, creating a single-optic design challenge for laser cooling in two stages at two distinct wavelengths - as commonly used for loading e.g. Sr or Yb atoms into optical lattice or tweezer clocks. Here, we optically characterize a wide variety of binary gratings at different wavelengths to find a simple empirical fit to experimental grating diffraction efficiency data in terms of dimensionless etch depth and period for various duty cycles. The model avoids complex 3D light-grating surface calculations, yet still yields results accurate to a few percent across a broad range of parameters. Gratings optimized for two (or more) wavelengths can now be designed in an informed manner suitable for a wide class of atomic species enabling advanced quantum technologies.
△ Less
Submitted 18 November, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
-
Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction
Authors:
Aoqi Guo,
Junnan Wu,
Peng Gao,
Wenbo Zhu,
Qinwen Guo,
Dazhi Gao,
Yujun Wang
Abstract:
Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input fea…
▽ More
Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input features and improve the estimation accuracy of the speech pre-separation module by avoiding information loss caused by direct dimensionality reduction in other models. Furthermore, we introduce a multi-head cross-attention mechanism that enhances the neural beamformer's perception of spatial information by making full use of the spatial information received by the array. Experimental results demonstrate that our approach, which incorporates a more reasonable target mask estimation network and a spatial information-based cross-attention mechanism into the neural beamformer, effectively improves speech separation performance.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Towards Trustworthy Explanation: On Causal Rationalization
Authors:
Wenbo Zhang,
Tong Wu,
Yunlong Wang,
Yong Cai,
Hengrui Cai
Abstract:
With recent advances in natural language processing, rationalization becomes an essential self-explaining diagram to disentangle the black box by selecting a subset of input texts to account for the major variation in prediction. Yet, existing association-based approaches on rationalization cannot identify true rationales when two or more snippets are highly inter-correlated and thus provide a sim…
▽ More
With recent advances in natural language processing, rationalization becomes an essential self-explaining diagram to disentangle the black box by selecting a subset of input texts to account for the major variation in prediction. Yet, existing association-based approaches on rationalization cannot identify true rationales when two or more snippets are highly inter-correlated and thus provide a similar contribution to prediction accuracy, so-called spuriousness. To address this limitation, we novelly leverage two causal desiderata, non-spuriousness and efficiency, into rationalization from the causal inference perspective. We formally define a series of probabilities of causation based on a newly proposed structural causal model of rationalization, with its theoretical identification established as the main component of learning necessary and sufficient rationales. The superior performance of the proposed causal rationalization is demonstrated on real-world review and medical datasets with extensive experiments compared to state-of-the-art methods.
△ Less
Submitted 8 September, 2023; v1 submitted 24 June, 2023;
originally announced June 2023.
-
Comparing Deep Learning Models for the Task of Volatility Prediction Using Multivariate Data
Authors:
Wenbo Ge,
Pooia Lalbakhsh,
Leigh Isai,
Artem Lensky,
Hanna Suominen
Abstract:
This study aims to compare multiple deep learning-based forecasters for the task of predicting volatility using multivariate data. The paper evaluates a range of models, starting from simpler and shallower ones and progressing to deeper and more complex architectures. Additionally, the performance of these models is compared against naive predictions and variations of classical GARCH models.
The…
▽ More
This study aims to compare multiple deep learning-based forecasters for the task of predicting volatility using multivariate data. The paper evaluates a range of models, starting from simpler and shallower ones and progressing to deeper and more complex architectures. Additionally, the performance of these models is compared against naive predictions and variations of classical GARCH models.
The prediction of volatility for five assets, namely S&P500, NASDAQ100, gold, silver, and oil, is specifically addressed using GARCH models, Multi-Layer Perceptrons, Recurrent Neural Networks, Temporal Convolutional Networks, and the Temporal Fusion Transformer. In the majority of cases, the Temporal Fusion Transformer, followed by variants of the Temporal Convolutional Network, outperformed classical approaches and shallow networks. These experiments were repeated, and the differences observed between the competing models were found to be statistically significant, thus providing strong encouragement for their practical application.
△ Less
Submitted 23 June, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
DiNADO: Norm-Disentangled Neurally-Decomposed Oracles for Controlling Language Models
Authors:
Sidi Lu,
Wenbo Zhao,
Chenyang Tao,
Arpit Gupta,
Shanchan Wu,
Tagyoung Chung,
Nanyun Peng
Abstract:
NeurAlly-Decomposed Oracle (NADO) is a powerful approach for controllable generation with large language models. It is designed to avoid catastrophic forgetting while achieving guaranteed convergence to an entropy-maximized closed-form optimal solution with reasonable modeling capacity. Despite the success, several challenges arise when apply NADO to a wide range of scenarios. Vanilla NADO suffers…
▽ More
NeurAlly-Decomposed Oracle (NADO) is a powerful approach for controllable generation with large language models. It is designed to avoid catastrophic forgetting while achieving guaranteed convergence to an entropy-maximized closed-form optimal solution with reasonable modeling capacity. Despite the success, several challenges arise when apply NADO to a wide range of scenarios. Vanilla NADO suffers from gradient vanishing for low-probability control signals and is highly reliant on a regularization to satisfy the stochastic version of Bellman equation. In addition, the vanilla implementation of NADO introduces a few additional transformer layers, suffering from a limited capacity especially compared to other finetune-based model adaptation methods like LoRA. In this paper, we propose a improved version of the NADO algorithm, namely DiNADO (norm-Disentangled NeurAlly-Decomposed Oracles), which improves the performance of the NADO algorithm through disentangling the step-wise global norm over the approximated oracle $R$-value for all potential next-tokens, allowing DiNADO to be combined with finetuning methods like LoRA. We discuss in depth how DiNADO achieves better capacity, stability and flexibility with both empirical and theoretical results. Experiments on formality control in machine translation and the lexically constrained generation task CommonGen demonstrates the significance of the improvements.
△ Less
Submitted 6 June, 2024; v1 submitted 20 June, 2023;
originally announced June 2023.
-
In Search of netUnicorn: A Data-Collection Platform to Develop Generalizable ML Models for Network Security Problems
Authors:
Roman Beltiukov,
Wenbo Guo,
Arpit Gupta,
Walter Willinger
Abstract:
The remarkable success of the use of machine learning-based solutions for network security problems has been impeded by the developed ML models' inability to maintain efficacy when used in different network environments exhibiting different network behaviors. This issue is commonly referred to as the generalizability problem of ML models. The community has recognized the critical role that trainin…
▽ More
The remarkable success of the use of machine learning-based solutions for network security problems has been impeded by the developed ML models' inability to maintain efficacy when used in different network environments exhibiting different network behaviors. This issue is commonly referred to as the generalizability problem of ML models. The community has recognized the critical role that training datasets play in this context and has developed various techniques to improve dataset curation to overcome this problem. Unfortunately, these methods are generally ill-suited or even counterproductive in the network security domain, where they often result in unrealistic or poor-quality datasets.
To address this issue, we propose an augmented ML pipeline that leverages explainable ML tools to guide the network data collection in an iterative fashion. To ensure the data's realism and quality, we require that the new datasets should be endogenously collected in this iterative process, thus advocating for a gradual removal of data-related problems to improve model generalizability. To realize this capability, we develop a data-collection platform, netUnicorn, that takes inspiration from the classic "hourglass" model and is implemented as its "thin waist" to simplify data collection for different learning problems from diverse network environments. The proposed system decouples data-collection intents from the deployment mechanisms and disaggregates these high-level intents into smaller reusable, self-contained tasks.
We demonstrate how netUnicorn simplifies collecting data for different learning problems from multiple network environments and how the proposed iterative data collection improves a model's generalizability.
△ Less
Submitted 10 September, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Accelerating Machine Learning Queries with Linear Algebra Query Processing
Authors:
Wenbo Sun,
Asterios Katsifodimos,
Rihan Hai
Abstract:
The rapid growth of large-scale machine learning (ML) models has led numerous commercial companies to utilize ML models for generating predictive results to help business decision-making. As two primary components in traditional predictive pipelines, data processing, and model predictions often operate in separate execution environments, leading to redundant engineering and computations. Additiona…
▽ More
The rapid growth of large-scale machine learning (ML) models has led numerous commercial companies to utilize ML models for generating predictive results to help business decision-making. As two primary components in traditional predictive pipelines, data processing, and model predictions often operate in separate execution environments, leading to redundant engineering and computations. Additionally, the diverging mathematical foundations of data processing and machine learning hinder cross-optimizations by combining these two components, thereby overlooking potential opportunities to expedite predictive pipelines.
In this paper, we propose an operator fusing method based on GPU-accelerated linear algebraic evaluation of relational queries. Our method leverages linear algebra computation properties to merge operators in machine learning predictions and data processing, significantly accelerating predictive pipelines by up to 317x. We perform a complexity analysis to deliver quantitative insights into the advantages of operator fusion, considering various data and model dimensions. Furthermore, we extensively evaluate matrix multiplication query processing utilizing the widely-used Star Schema Benchmark. Through comprehensive evaluations, we demonstrate the effectiveness and potential of our approach in improving the efficiency of data processing and machine learning workloads on modern hardware.
△ Less
Submitted 24 January, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.
-
From NeRFLiX to NeRFLiX++: A General NeRF-Agnostic Restorer Paradigm
Authors:
Kun Zhou,
Wenbo Li,
Nianjuan Jiang,
Xiaoguang Han,
Jiangbo Lu
Abstract:
Neural radiance fields (NeRF) have shown great success in novel view synthesis. However, recovering high-quality details from real-world scenes is still challenging for the existing NeRF-based approaches, due to the potential imperfect calibration information and scene representation inaccuracy. Even with high-quality training frames, the synthetic novel views produced by NeRF models still suffer…
▽ More
Neural radiance fields (NeRF) have shown great success in novel view synthesis. However, recovering high-quality details from real-world scenes is still challenging for the existing NeRF-based approaches, due to the potential imperfect calibration information and scene representation inaccuracy. Even with high-quality training frames, the synthetic novel views produced by NeRF models still suffer from notable rendering artifacts, such as noise and blur. To address this, we propose NeRFLiX, a general NeRF-agnostic restorer paradigm that learns a degradation-driven inter-viewpoint mixer. Specially, we design a NeRF-style degradation modeling approach and construct large-scale training data, enabling the possibility of effectively removing NeRF-native rendering artifacts for deep neural networks. Moreover, beyond the degradation removal, we propose an inter-viewpoint aggregation framework that fuses highly related high-quality training images, pushing the performance of cutting-edge NeRF models to entirely new levels and producing highly photo-realistic synthetic views. Based on this paradigm, we further present NeRFLiX++ with a stronger two-stage NeRF degradation simulator and a faster inter-viewpoint mixer, achieving superior performance with significantly improved computational efficiency. Notably, NeRFLiX++ is capable of restoring photo-realistic ultra-high-resolution outputs from noisy low-resolution NeRF-rendered views. Extensive experiments demonstrate the excellent restoration ability of NeRFLiX++ on various novel view synthesis benchmarks.
△ Less
Submitted 13 December, 2023; v1 submitted 10 June, 2023;
originally announced June 2023.
-
Unsupervised Melody-to-Lyric Generation
Authors:
Yufei Tian,
Anjali Narayan-Chen,
Shereen Oraby,
Alessandra Cervone,
Gunnar Sigurdsson,
Chenyang Tao,
Wenbo Zhao,
Yiwen Chen,
Tagyoung Chung,
**g Huang,
Nanyun Peng
Abstract:
Automatic melody-to-lyric generation is a task in which song lyrics are generated to go with a given melody. It is of significant practical interest and more challenging than unconstrained lyric generation as the music imposes additional constraints onto the lyrics. The training data is limited as most songs are copyrighted, resulting in models that underfit the complicated cross-modal relationshi…
▽ More
Automatic melody-to-lyric generation is a task in which song lyrics are generated to go with a given melody. It is of significant practical interest and more challenging than unconstrained lyric generation as the music imposes additional constraints onto the lyrics. The training data is limited as most songs are copyrighted, resulting in models that underfit the complicated cross-modal relationship between melody and lyrics. In this work, we propose a method for generating high-quality lyrics without training on any aligned melody-lyric data. Specifically, we design a hierarchical lyric generation framework that first generates a song outline and second the complete lyrics. The framework enables disentanglement of training (based purely on text) from inference (melody-guided text generation) to circumvent the shortage of parallel data.
We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints as guidance during inference. The two-step hierarchical design also enables content control via the lyric outline, a much-desired feature for democratizing collaborative song creation. Experimental results show that our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines, for example SongMASS, a SOTA model trained on a parallel dataset, with a 24% relative overall quality improvement based on human ratings.
△ Less
Submitted 22 December, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Inferring Private Personal Attributes of Virtual Reality Users from Head and Hand Motion Data
Authors:
Vivek Nair,
Christian Rack,
Wenbo Guo,
Rui Wang,
Shuixian Li,
Brandon Huang,
Atticus Cull,
James F. O'Brien,
Marc Latoschik,
Louis Rosenberg,
Dawn Song
Abstract:
Motion tracking "telemetry" data lies at the core of nearly all modern virtual reality (VR) and metaverse experiences. While generally presumed innocuous, recent studies have demonstrated that motion data actually has the potential to uniquely identify VR users. In this study, we go a step further, showing that a variety of private user information can be inferred just by analyzing motion data rec…
▽ More
Motion tracking "telemetry" data lies at the core of nearly all modern virtual reality (VR) and metaverse experiences. While generally presumed innocuous, recent studies have demonstrated that motion data actually has the potential to uniquely identify VR users. In this study, we go a step further, showing that a variety of private user information can be inferred just by analyzing motion data recorded from VR devices. We conducted a large-scale survey of VR users (N=1,006) with dozens of questions ranging from background and demographics to behavioral patterns and health information. We then obtained VR motion samples of each user playing the game "Beat Saber," and attempted to infer their survey responses using just their head and hand motion patterns. Using simple machine learning models, over 40 personal attributes could be accurately and consistently inferred from VR motion data alone. Despite this significant observed leakage, there remains limited awareness of the privacy implications of VR motion data, highlighting the pressing need for privacy-preserving mechanisms in multi-user VR applications.
△ Less
Submitted 10 June, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Client: Cross-variable Linear Integrated Enhanced Transformer for Multivariate Long-Term Time Series Forecasting
Authors:
Jiaxin Gao,
Wenbo Hu,
Yuntian Chen
Abstract:
Long-term time series forecasting (LTSF) is a crucial aspect of modern society, playing a pivotal role in facilitating long-term planning and develo** early warning systems. While many Transformer-based models have recently been introduced for LTSF, a doubt have been raised regarding the effectiveness of attention modules in capturing cross-time dependencies. In this study, we design a mask-seri…
▽ More
Long-term time series forecasting (LTSF) is a crucial aspect of modern society, playing a pivotal role in facilitating long-term planning and develo** early warning systems. While many Transformer-based models have recently been introduced for LTSF, a doubt have been raised regarding the effectiveness of attention modules in capturing cross-time dependencies. In this study, we design a mask-series experiment to validate this assumption and subsequently propose the "Cross-variable Linear Integrated ENhanced Transformer for Multivariate Long-Term Time Series Forecasting" (Client), an advanced model that outperforms both traditional Transformer-based models and linear models. Client employs linear modules to learn trend information and attention modules to capture cross-variable dependencies. Meanwhile, it simplifies the embedding and position encoding layers and replaces the decoder module with a projection layer. Essentially, Client incorporates non-linearity and cross-variable dependencies, which sets it apart from conventional linear models and Transformer-based models. Extensive experiments with nine real-world datasets have confirmed the SOTA performance of Client with the least computation time and memory consumption compared with the previous Transformer-based models. Our code is available at https://github.com/daxin007/Client.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Quasi-linear fractional-order operators in Lipschitz domains
Authors:
Juan Pablo Borthagaray,
Wenbo Li,
Ricardo H. Nochetto
Abstract:
We prove Besov boundary regularity for solutions of the homogeneous Dirichlet problem for fractional-order quasi-linear operators with variable coefficients on Lipschitz domains $Ω$ of $\mathbb{R}^d$. Our estimates are consistent with the boundary behavior of solutions on smooth domains and apply to fractional $p$-Laplacians and operators with finite horizon. The proof exploits the underlying vari…
▽ More
We prove Besov boundary regularity for solutions of the homogeneous Dirichlet problem for fractional-order quasi-linear operators with variable coefficients on Lipschitz domains $Ω$ of $\mathbb{R}^d$. Our estimates are consistent with the boundary behavior of solutions on smooth domains and apply to fractional $p$-Laplacians and operators with finite horizon. The proof exploits the underlying variational structure and uses a new and flexible local translation operator. We further apply these regularity estimates to derive novel error estimates for finite element approximations of fractional $p$-Laplacians and present several simulations that reveal the boundary behavior of solutions.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
Robust Optical Data Encryption by Projection-Photoaligned Polymer-Stabilized-Liquid-Crystals
Authors:
Siying Liu,
Saleh Alfarhan,
Wenbo Wang,
Shuai Feng,
Yuxiang Zhu,
Luyang Liu,
Kenan Song,
Sui Yang,
Kailong **,
Xiangfan Chen
Abstract:
The emerging Internet of Things (IoTs) invokes increasing security demands that require robust encryption or anti-counterfeiting technologies. Albeit being acknowledged as efficacious solutions in processing elaborate graphical information via multiple degrees of freedom, optical data encryption and anti-counterfeiting techniques are typically inept in delivering satisfactory performance without c…
▽ More
The emerging Internet of Things (IoTs) invokes increasing security demands that require robust encryption or anti-counterfeiting technologies. Albeit being acknowledged as efficacious solutions in processing elaborate graphical information via multiple degrees of freedom, optical data encryption and anti-counterfeiting techniques are typically inept in delivering satisfactory performance without compromising the desired ease-of-processibility or compatibility, thus leading to the exploration of novel materials and devices that are competent. Here, a robust optical data encryption technique is demonstrated utilizing polymer-stabilized-liquid-crystals (PSLCs) combined with projection photoalignment and photopatterning methods. The PSLCs possess implicit optical patterns encoded via photoalignment, as well as explicit geometries produced via photopatterning. Furthermore, the PSLCs demonstrate improved robustness against harsh chemical environments and thermal stability, and can be directly deployed onto various rigid and flexible substrates. Based on this, it is demonstrated that single PSLC is apt to carry intricate information, or serve as exclusive watermark with both implicit features and explicit geometries. Moreover, a novel, generalized design strategy is developed, for the first time, to encode intricate and exclusive information with enhanced security by spatially programming the photoalignment patterns of a pair of cascade PSLCs, which further illustrates the promising capabilies of PSLCs in optical data encryption and anti-counterfeiting.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Error-Mitigated Quantum Routing on Noisy Devices
Authors:
Wenbo Shi,
Robert Malaney
Abstract:
With sub-threshold quantum error correction on quantum hardware still out of reach, quantum error mitigation methods are currently deemed an attractive option for implementing certain applications on near-term noisy quantum devices. One such application is quantum routing - the ability to map an incoming quantum signal into a superposition of paths. In this work, we use a 7-qubit IBM quantum devic…
▽ More
With sub-threshold quantum error correction on quantum hardware still out of reach, quantum error mitigation methods are currently deemed an attractive option for implementing certain applications on near-term noisy quantum devices. One such application is quantum routing - the ability to map an incoming quantum signal into a superposition of paths. In this work, we use a 7-qubit IBM quantum device to experimentally deploy two promising quantum error mitigation methods, Zero-Noise Extrapolation (ZNE) and Probabilistic Error Cancellation (PEC), in the context of quantum routing. Importantly, beyond investigating the improved performance of quantum routing via ZNE and PEC separately, we also investigate the routing performance provided by the concatenation of these two error-mitigation methods. Our experimental results demonstrate that such concatenation leads a very significant performance improvement relative to implementation with no error mitigation. Indeed, an almost perfect performance in terms of fidelity of the output entangled paths is found. These new results reveal that with concatenated quantum error-mitigation embedded, useful quantum routing becomes feasible on current devices without the need for quantum error correction - opening up a potential implementation pathway to other applications that utilize a superposition of communication links.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Iterative Adversarial Attack on Image-guided Story Ending Generation
Authors:
Youze Wang,
Wenbo Hu,
Richang Hong
Abstract:
Multimodal learning involves develo** models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships…
▽ More
Multimodal learning involves develo** models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships between text and image data with a complete story text ending. Unfortunately, deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples. Current adversarial attack methods mainly focus on single-modality data and do not analyze adversarial attacks for multimodal text generation tasks that use cross-modal information. To this end, we propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks, allowing for an attack search for adversarial text and image in an more effective iterative way. Experimental results demonstrate that the proposed method outperforms existing single-modal and non-iterative multimodal attack methods, indicating the potential for improving the adversarial robustness of multimodal text generation models, such as multimodal machine translation, multimodal question answering, etc.
△ Less
Submitted 23 January, 2024; v1 submitted 16 May, 2023;
originally announced May 2023.
-
Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition
Authors:
Hong Liu,
Zhaobiao Lv,
Zhijian Ou,
Wenbo Zhao,
Qing Xiao
Abstract:
Energy-based language models (ELMs) parameterize an unnormalized distribution for natural sentences and are radically different from popular autoregressive language models (ALMs). As an important application, ELMs have been successfully used as a means for calculating sentence scores in speech recognition, but they all use less-modern CNN or LSTM networks. The recent progress in Transformer networ…
▽ More
Energy-based language models (ELMs) parameterize an unnormalized distribution for natural sentences and are radically different from popular autoregressive language models (ALMs). As an important application, ELMs have been successfully used as a means for calculating sentence scores in speech recognition, but they all use less-modern CNN or LSTM networks. The recent progress in Transformer networks and large pretrained models such as BERT and GPT2 opens new possibility to further advancing ELMs. In this paper, we explore different architectures of energy functions and different training methods to investigate the capabilities of ELMs in rescoring for speech recognition, all using large pretrained models as backbones.
△ Less
Submitted 29 May, 2023; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Self-Aware Trajectory Prediction for Safe Autonomous Driving
Authors:
Wenbo Shao,
Jun Li,
Hong Wang
Abstract:
Trajectory prediction is one of the key components of the autonomous driving software stack. Accurate prediction for the future movement of surrounding traffic participants is an important prerequisite for ensuring the driving efficiency and safety of intelligent vehicles. Trajectory prediction algorithms based on artificial intelligence have been widely studied and applied in recent years and hav…
▽ More
Trajectory prediction is one of the key components of the autonomous driving software stack. Accurate prediction for the future movement of surrounding traffic participants is an important prerequisite for ensuring the driving efficiency and safety of intelligent vehicles. Trajectory prediction algorithms based on artificial intelligence have been widely studied and applied in recent years and have achieved remarkable results. However, complex artificial intelligence models are uncertain and difficult to explain, so they may face unintended failures when applied in the real world. In this paper, a self-aware trajectory prediction method is proposed. By introducing a self-awareness module and a two-stage training process, the original trajectory prediction module's performance is estimated online, to facilitate the system to deal with the possible scenario of insufficient prediction function in time, and create conditions for the realization of safe and reliable autonomous driving. Comprehensive experiments and analysis are performed, and the proposed method performed well in terms of self-awareness, memory footprint, and real-time performance, showing that it may serve as a promising paradigm for safe autonomous driving.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
REMAST: Real-time Emotion-based Music Arrangement with Soft Transition
Authors:
Zihao Wang,
Le Ma,
Chen Zhang,
Bo Han,
Yunfei Xu,
Yikai Wang,
Xinyi Chen,
HaoRong Hong,
Wenbo Liu,
Xinda Wu,
Kejun Zhang
Abstract:
Music as an emotional intervention medium has important applications in scenarios such as music therapy, games, and movies. However, music needs real-time arrangement according to changing emotions, bringing challenges to balance emotion real-time fit and soft emotion transition due to the fine-grained and mutable nature of the target emotion. Existing studies mainly focus on achieving emotion rea…
▽ More
Music as an emotional intervention medium has important applications in scenarios such as music therapy, games, and movies. However, music needs real-time arrangement according to changing emotions, bringing challenges to balance emotion real-time fit and soft emotion transition due to the fine-grained and mutable nature of the target emotion. Existing studies mainly focus on achieving emotion real-time fit, while the issue of smooth transition remains understudied, affecting the overall emotional coherence of the music. In this paper, we propose REMAST to address this trade-off. Specifically, we recognize the last timestep's music emotion and fuse it with the current timestep's input emotion. The fused emotion then guides REMAST to generate the music based on the input melody. To adjust music similarity and emotion real-time fit flexibly, we downsample the original melody and feed it into the generation model. Furthermore, we design four music theory features by domain knowledge to enhance emotion information and employ semi-supervised learning to mitigate the subjective bias introduced by manual dataset annotation. According to the evaluation results, REMAST surpasses the state-of-the-art methods in objective and subjective metrics. These results demonstrate that REMAST achieves real-time fit and smooth transition simultaneously, enhancing the coherence of the generated music.
△ Less
Submitted 5 February, 2024; v1 submitted 13 May, 2023;
originally announced May 2023.
-
Graph bundles and Ricci-flatness
Authors:
Wenbo Li,
Shi** Liu
Abstract:
We develop a systematical way of constructing S-Ricci flat graphs which are not Abelian Cayley via graph bundle with explicit examples. For this purpose, we prove that, with some natural constrains, a non-trivial graph bundle can not be isomorphic (as graphs) to the product of the base graph and fiber graph. It stands in clear contrast to the continuous case.
We develop a systematical way of constructing S-Ricci flat graphs which are not Abelian Cayley via graph bundle with explicit examples. For this purpose, we prove that, with some natural constrains, a non-trivial graph bundle can not be isomorphic (as graphs) to the product of the base graph and fiber graph. It stands in clear contrast to the continuous case.
△ Less
Submitted 13 May, 2023;
originally announced May 2023.
-
Unsupervised Melody-Guided Lyrics Generation
Authors:
Yufei Tian,
Anjali Narayan-Chen,
Shereen Oraby,
Alessandra Cervone,
Gunnar Sigurdsson,
Chenyang Tao,
Wenbo Zhao,
Tagyoung Chung,
**g Huang,
Nanyun Peng
Abstract:
Automatic song writing is a topic of significant practical interest. However, its research is largely hindered by the lack of training data due to copyright concerns and challenged by its creative nature. Most noticeably, prior works often fall short of modeling the cross-modal correlation between melody and lyrics due to limited parallel data, hence generating lyrics that are less singable. Exist…
▽ More
Automatic song writing is a topic of significant practical interest. However, its research is largely hindered by the lack of training data due to copyright concerns and challenged by its creative nature. Most noticeably, prior works often fall short of modeling the cross-modal correlation between melody and lyrics due to limited parallel data, hence generating lyrics that are less singable. Existing works also lack effective mechanisms for content control, a much desired feature for democratizing song creation for people with limited music background. In this work, we propose to generate pleasantly listenable lyrics without training on melody-lyric aligned data. Instead, we design a hierarchical lyric generation framework that disentangles training (based purely on text) from inference (melody-guided text generation). At inference time, we leverage the crucial alignments between melody and lyrics and compile the given melody into constraints to guide the generation process. Evaluation results show that our model can generate high-quality lyrics that are more singable, intelligible, coherent, and in rhyme than strong baselines including those supervised on parallel data.
△ Less
Submitted 25 May, 2023; v1 submitted 12 May, 2023;
originally announced May 2023.
-
Detection of multiple phase space overdensities of GSE stars by orbit integration
Authors:
WenboWu,
GangZhao,
JiangChang,
Xiang-XiangXue,
YuqinChen,
ChengdongLi,
Xianhao Ye,
Chengqun Yang
Abstract:
In N-body simulations, nearly radial mergers can form shell-like overdensities in the sky position and phase space ($r-v_r$) due to the combination of dynamical friction and tidal strip**. The merger event of Gaia-Sausage-Enceladus has provided a unique opportunity to study the shells in the phase space. To search for them, we integrate the orbits of 5949 GSE-related halo K giants from the LAMOS…
▽ More
In N-body simulations, nearly radial mergers can form shell-like overdensities in the sky position and phase space ($r-v_r$) due to the combination of dynamical friction and tidal strip**. The merger event of Gaia-Sausage-Enceladus has provided a unique opportunity to study the shells in the phase space. To search for them, we integrate the orbits of 5949 GSE-related halo K giants from the LAMOST survey and record their positions at all time intervals in $r-v_r$ diagram. After the subtraction of a smoothed background, we find six significant and complete thin chevron-like overdensities. The apocenters $r_\mathrm{apo}$ of stars in the six chevrons are around 6.75, 12.75, 18.75, 25.25, 27.25, and 30.25 kpc. These chevrons reveal the multiple pile-ups of GSE stars at different apocenters. The application of a different Milky Way mass $M_\mathrm{vir}$ will change the opening angles of these chevrons, while leave their apocenters almost unchanged. By comparing with a recent study of the phase space overdensities of local halo stars from Gaia RVS survey, our results are more inclined to a medium $M_\mathrm{vir}$ of $10^{12}\,M_\odot$. The application of a non-axisymmetric Galactic potential with a steadily rotating bar has a blurring effect on the appearance of these chevron-like overdensities, especially for the chevrons with $r_\mathrm{apo} > 20$ kpc.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
PerFedRec++: Enhancing Personalized Federated Recommendation with Self-Supervised Pre-Training
Authors:
Sichun Luo,
Yuanzhang Xiao,
Xinyi Zhang,
Yang Liu,
Wenbo Ding,
Linqi Song
Abstract:
Federated recommendation systems employ federated learning techniques to safeguard user privacy by transmitting model parameters instead of raw user data between user devices and the central server. Nevertheless, the current federated recommender system faces challenges such as heterogeneity and personalization, model performance degradation, and communication bottleneck. Previous studies have att…
▽ More
Federated recommendation systems employ federated learning techniques to safeguard user privacy by transmitting model parameters instead of raw user data between user devices and the central server. Nevertheless, the current federated recommender system faces challenges such as heterogeneity and personalization, model performance degradation, and communication bottleneck. Previous studies have attempted to address these issues, but none have been able to solve them simultaneously.
In this paper, we propose a novel framework, named PerFedRec++, to enhance the personalized federated recommendation with self-supervised pre-training. Specifically, we utilize the privacy-preserving mechanism of federated recommender systems to generate two augmented graph views, which are used as contrastive tasks in self-supervised graph learning to pre-train the model. Pre-training enhances the performance of federated models by improving the uniformity of representation learning. Also, by providing a better initial state for federated training, pre-training makes the overall training converge faster, thus alleviating the heavy communication burden. We then construct a collaborative graph to learn the client representation through a federated graph neural network. Based on these learned representations, we cluster users into different user groups and learn personalized models for each cluster. Each user learns a personalized model by combining the global federated model, the cluster-level federated model, and its own fine-tuned local model. Experiments on three real-world datasets show that our proposed method achieves superior performance over existing methods.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Visuotactile Sensor Enabled Pneumatic Device Towards Compliant Oropharyngeal Swab Sampling
Authors:
Shoujie Li,
Mingshan He,
Wenbo Ding,
Linqi Ye,
Xueqian Wang,
Junbo Tan,
**qiu Yuan,
Xiao-** Zhang
Abstract:
Manual oropharyngeal (OP) swab sampling is an intensive and risky task. In this article, a novel OP swab sampling device of low cost and high compliance is designed by combining the visuo-tactile sensor and the pneumatic actuator-based gripper. Here, a concave visuo-tactile sensor called CoTac is first proposed to address the problems of high cost and poor reliability of traditional multi-axis for…
▽ More
Manual oropharyngeal (OP) swab sampling is an intensive and risky task. In this article, a novel OP swab sampling device of low cost and high compliance is designed by combining the visuo-tactile sensor and the pneumatic actuator-based gripper. Here, a concave visuo-tactile sensor called CoTac is first proposed to address the problems of high cost and poor reliability of traditional multi-axis force sensors. Besides, by imitating the doctor's fingers, a soft pneumatic actuator with a rigid skeleton structure is designed, which is demonstrated to be reliable and safe via finite element modeling and experiments. Furthermore, we propose a sampling method that adopts a compliant control algorithm based on the adaptive virtual force to enhance the safety and compliance of the swab sampling process. The effectiveness of the device has been verified through sampling experiments as well as in vivo tests, indicating great application potential. The cost of the device is around 30 US dollars and the total weight of the functional part is less than 0.1 kg, allowing the device to be rapidly deployed on various robotic arms. Videos, hardware, and source code are available at: https://sites.google.com/view/swab-sampling/.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Trustworthy Multi-phase Liver Tumor Segmentation via Evidence-based Uncertainty
Authors:
Chuanfei Hu,
Tianyi Xia,
Ying Cui,
Quchen Zou,
Yuancheng Wang,
Wenbo Xiao,
Shenghong Ju,
Xinde Li
Abstract:
Multi-phase liver contrast-enhanced computed tomography (CECT) images convey the complementary multi-phase information for liver tumor segmentation (LiTS), which are crucial to assist the diagnosis of liver cancer clinically. However, the performances of existing multi-phase liver tumor segmentation (MPLiTS)-based methods suffer from redundancy and weak interpretability, % of the fused result, res…
▽ More
Multi-phase liver contrast-enhanced computed tomography (CECT) images convey the complementary multi-phase information for liver tumor segmentation (LiTS), which are crucial to assist the diagnosis of liver cancer clinically. However, the performances of existing multi-phase liver tumor segmentation (MPLiTS)-based methods suffer from redundancy and weak interpretability, % of the fused result, resulting in the implicit unreliability of clinical applications. In this paper, we propose a novel trustworthy multi-phase liver tumor segmentation (TMPLiTS), which is a unified framework jointly conducting segmentation and uncertainty estimation. The trustworthy results could assist the clinicians to make a reliable diagnosis. Specifically, Dempster-Shafer Evidence Theory (DST) is introduced to parameterize the segmentation and uncertainty as evidence following Dirichlet distribution. The reliability of segmentation results among multi-phase CECT images is quantified explicitly. Meanwhile, a multi-expert mixture scheme (MEMS) is proposed to fuse the multi-phase evidences, which can guarantee the effect of fusion procedure based on theoretical analysis. Experimental results demonstrate the superiority of TMPLiTS compared with the state-of-the-art methods. Meanwhile, the robustness of TMPLiTS is verified, where the reliable performance can be guaranteed against the perturbations.
△ Less
Submitted 20 June, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Fundamental Limits of Distributed Linearly Separable Computation under Cyclic Assignment
Authors:
Wenbo Huang,
Kai Wan,
Hua Sun,
Mingyue Ji,
Robert Caiming Qiu,
Giuseppe Caire
Abstract:
This paper studies the master-worker distributed linearly separable computation problem, where the considered computation task, referred to as linearly separable function, is a typical linear transform model widely used in cooperative distributed gradient coding, real-time rendering, linear transformers, etc. %A master asks $\Nsf$ distributed workers to compute a linearly separable function from…
▽ More
This paper studies the master-worker distributed linearly separable computation problem, where the considered computation task, referred to as linearly separable function, is a typical linear transform model widely used in cooperative distributed gradient coding, real-time rendering, linear transformers, etc. %A master asks $\Nsf$ distributed workers to compute a linearly separable function from $\Ksf$ datasets. The computation task on $\Ksf$ datasets can be expressed as $\Ksf_{\rm c}$ linear combinations of $\Ksf$ messages, where each message is the output of an individual function on one dataset. Straggler effect is also considered, such that from the answers of any $\Nsf_{\rm r}$ of the $\Nsf$ distributed workers, the master should accomplish the task. The computation cost is defined as the number of datasets assigned to each worker, while the communication cost is defined as the number of (coded) messages that should be received. The objective is to characterize the optimal tradeoff between the computation and communication costs. The problem has remained so far open, even under the cyclic data assignment.Since in fact various distributed computing schemes were proposed in the literature under the cyclic data assignment, with this paper we close the problem for the cyclic assignment. This paper proposes a new computing scheme with the cyclic assignment based on the concept of interference alignment, by treating each message which cannot be computed by a worker as an interference from this worker. Under the cyclic assignment, the proposed computing scheme is then proved to be optimal when $\Nsf=\Ksf$ and be order optimal within a factor of $2$ otherwise.
△ Less
Submitted 19 February, 2024; v1 submitted 8 May, 2023;
originally announced May 2023.
-
Site-testing at the Muztagh-ata Site.V. Nighttime Cloud Amount during the Last Five Years
Authors:
**g Xu,
Guo-jie Feng,
Guang-xin Pu,
Le-tian Wang,
Zi-Huang Cao,
Li-Qing Ren,
Xuan Zhang,
Shu-guo Ma,
Chun-hai Bai,
Ali Esamdin,
Jian Li,
Yuan Tian,
Zheng Wang,
Yong-heng Zhao,
Jian-rong Shi
Abstract:
The clarity of nights is the major factor that should be carefully considered for optical/infrared astronomical observatories in site-testing campaigns. Cloud coverage is directly related to the amount of time available for scientific observations at observatories. In this article, we report on the results of detailed night-time cloud statistics and continuous observing me derived from ground-base…
▽ More
The clarity of nights is the major factor that should be carefully considered for optical/infrared astronomical observatories in site-testing campaigns. Cloud coverage is directly related to the amount of time available for scientific observations at observatories. In this article, we report on the results of detailed night-time cloud statistics and continuous observing me derived from ground-based all-sky cameras at the Muztagh-ata site from 2017 to 2021. Results obtained from acquisition data show that the proportion of the annual observing me at the Muztagh-ata site is 65%, and the best period with the least cloud coverage and longer continuous observing time is from September to February. We made a comparison of the monthly mean observing nights obtained from our all-sky cameras and CLARA dataset, results show that the discrepancy between them may depend on the cloud top heights. On average, this site can provide 175 clear nights and 169 nights with at least 4 hours of continuous observing time per year.
△ Less
Submitted 8 May, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Preparation of multiphoton high-dimensional GHZ state
Authors:
Wen-Bo Xing,
Xiao-Min Hu,
Yu Guo,
Bi-Heng Liu,
Chuan-Feng Li,
Guang-Can Guo
Abstract:
Multipartite high-dimensional entanglement presents different physics from multipartite two-dimensional entanglement. However, how to prepare multipartite high-dimensional entanglement is still a challenge with linear optics. In this paper, a multiphoton GHZ state with arbitrary dimensions preparation protocol is proposed in optical systems. In this protocol, we use auxiliary entanglements to real…
▽ More
Multipartite high-dimensional entanglement presents different physics from multipartite two-dimensional entanglement. However, how to prepare multipartite high-dimensional entanglement is still a challenge with linear optics. In this paper, a multiphoton GHZ state with arbitrary dimensions preparation protocol is proposed in optical systems. In this protocol, we use auxiliary entanglements to realize a high-dimensional entanglement gate, so that high-dimensional entangled pairs can be connected into a multipartite high-dimensional GHZ state. Specifically, we give an example of using photons' path degree of freedom to prepare a 4-particle 3-dimensional GHZ state. Our method can be extended to other degrees of freedom and can generate arbitrary GHZ entanglement in any dimension.
△ Less
Submitted 26 July, 2023; v1 submitted 25 April, 2023;
originally announced April 2023.
-
Instant-3D: Instant Neural Radiance Field Training Towards On-Device AR/VR 3D Reconstruction
Authors:
Sixu Li,
Chaojian Li,
Wenbo Zhu,
Boyang,
Yu,
Yang,
Zhao,
Cheng Wan,
Haoran You,
Huihong Shi,
Yingyan,
Lin
Abstract:
Neural Radiance Field (NeRF) based 3D reconstruction is highly desirable for immersive Augmented and Virtual Reality (AR/VR) applications, but achieving instant (i.e., < 5 seconds) on-device NeRF training remains a challenge. In this work, we first identify the inefficiency bottleneck: the need to interpolate NeRF embeddings up to 200,000 times from a 3D embedding grid during each training iterati…
▽ More
Neural Radiance Field (NeRF) based 3D reconstruction is highly desirable for immersive Augmented and Virtual Reality (AR/VR) applications, but achieving instant (i.e., < 5 seconds) on-device NeRF training remains a challenge. In this work, we first identify the inefficiency bottleneck: the need to interpolate NeRF embeddings up to 200,000 times from a 3D embedding grid during each training iteration. To alleviate this, we propose Instant-3D, an algorithm-hardware co-design acceleration framework that achieves instant on-device NeRF training. Our algorithm decomposes the embedding grid representation in terms of color and density, enabling computational redundancy to be squeezed out by adopting different (1) grid sizes and (2) update frequencies for the color and density branches. Our hardware accelerator further reduces the dominant memory accesses for embedding grid interpolation by (1) map** multiple nearby points' memory read requests into one during the feed-forward process, (2) merging embedding grid updates from the same sliding time window during back-propagation, and (3) fusing different computation cores to support the different grid sizes needed by the color and density branches of Instant-3D algorithm. Extensive experiments validate the effectiveness of Instant-3D, achieving a large training time reduction of 41x - 248x while maintaining the same reconstruction quality. Excitingly, Instant-3D has enabled instant 3D reconstruction for AR/VR, requiring a reconstruction time of only 1.6 seconds per scene and meeting the AR/VR power consumption constraint of 1.9 W.
△ Less
Submitted 14 January, 2024; v1 submitted 24 April, 2023;
originally announced April 2023.
-
End-to-End Feasible Optimization Proxies for Large-Scale Economic Dispatch
Authors:
Wenbo Chen,
Mathieu Tanneau,
Pascal Van Hentenryck
Abstract:
The paper proposes a novel End-to-End Learning and Repair (E2ELR) architecture for training optimization proxies for economic dispatch problems. E2ELR combines deep neural networks with closed-form, differentiable repair layers, thereby integrating learning and feasibility in an end-to-end fashion. E2ELR is also trained with self-supervised learning, removing the need for labeled data and the solv…
▽ More
The paper proposes a novel End-to-End Learning and Repair (E2ELR) architecture for training optimization proxies for economic dispatch problems. E2ELR combines deep neural networks with closed-form, differentiable repair layers, thereby integrating learning and feasibility in an end-to-end fashion. E2ELR is also trained with self-supervised learning, removing the need for labeled data and the solving of numerous optimization problems offline. E2ELR is evaluated on industry-size power grids with tens of thousands of buses using an economic dispatch that co-optimizes energy and reserves. The results demonstrate that the self-supervised E2ELR achieves state-of-the-art performance, with optimality gaps that outperform other baselines by at least an order of magnitude.
△ Less
Submitted 18 August, 2023; v1 submitted 23 April, 2023;
originally announced April 2023.
-
On vanishing of fundamental forms of algebraic varieties
Authors:
Lawrence Ein,
Wenbo Niu
Abstract:
We study fundamental forms of algebraic varieties using the sheaves of principal parts of line bundles and establish a vanishing theorem for any order fundamental forms. We also give connection of fundamental forms with the higher order Gauss map and higher order tangent varieties.
We study fundamental forms of algebraic varieties using the sheaves of principal parts of line bundles and establish a vanishing theorem for any order fundamental forms. We also give connection of fundamental forms with the higher order Gauss map and higher order tangent varieties.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.