-
Towards Open Domain Text-Driven Synthesis of Multi-Person Motions
Authors:
Mengyi Shan,
Lu Dong,
Yutao Han,
Yuan Yao,
Tao Liu,
Ifeoma Nwogu,
Guo-Jun Qi,
Mitch Hill
Abstract:
This work aims to generate natural and diverse group motions of multiple humans from textual descriptions. While single-person text-to-motion generation is extensively studied, it remains challenging to synthesize motions for more than one or two subjects from in-the-wild prompts, mainly due to the lack of available datasets. In this work, we curate human pose and motion datasets by estimating pos…
▽ More
This work aims to generate natural and diverse group motions of multiple humans from textual descriptions. While single-person text-to-motion generation is extensively studied, it remains challenging to synthesize motions for more than one or two subjects from in-the-wild prompts, mainly due to the lack of available datasets. In this work, we curate human pose and motion datasets by estimating pose information from large-scale image and video datasets. Our models use a transformer-based diffusion framework that accommodates multiple datasets with any number of subjects or frames. Experiments explore both generation of multi-person static poses and generation of multi-person motion sequences. To our knowledge, our method is the first to generate multi-subject motion sequences with high diversity and fidelity from a large variety of textual prompts.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Label-Efficient 3D Object Detection For Road-Side Units
Authors:
Minh-Quan Dao,
Holger Caesar,
Julie Stephany Berrio,
Mao Shan,
Stewart Worrall,
Vincent Frémont,
Ezio Malis
Abstract:
Occlusion presents a significant challenge for safety-critical applications such as autonomous driving. Collaborative perception has recently attracted a large research interest thanks to the ability to enhance the perception of autonomous vehicles via deep information fusion with intelligent roadside units (RSU), thus minimizing the impact of occlusion. While significant advancement has been made…
▽ More
Occlusion presents a significant challenge for safety-critical applications such as autonomous driving. Collaborative perception has recently attracted a large research interest thanks to the ability to enhance the perception of autonomous vehicles via deep information fusion with intelligent roadside units (RSU), thus minimizing the impact of occlusion. While significant advancement has been made, the data-hungry nature of these methods creates a major hurdle for their real-world deployment, particularly due to the need for annotated RSU data. Manually annotating the vast amount of RSU data required for training is prohibitively expensive, given the sheer number of intersections and the effort involved in annotating point clouds. We address this challenge by devising a label-efficient object detection method for RSU based on unsupervised object discovery. Our paper introduces two new modules: one for object discovery based on a spatial-temporal aggregation of point clouds, and another for refinement. Furthermore, we demonstrate that fine-tuning on a small portion of annotated data allows our object discovery models to narrow the performance gap with, or even surpass, fully supervised models. Extensive experiments are carried out in simulated and real-world datasets to evaluate our method.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
On generalizations of Iwasawa's theorem
Authors:
Jiangtao Shi,
Fanjie Xu,
Mengjiao Shan
Abstract:
Iwasawa's theorem indicates that a finite group $G$ is supersolvable if and only if all maximal chains of the identity in $G$ have the same length. As generalizations of Iwasawa's theorem, we provide some characterizations of the structure of a finite group $G$ in which all maximal chains of every minimal subgroup have the same length. Moreover, let $δ(G)$ be the number of subgroups of $G$ all of…
▽ More
Iwasawa's theorem indicates that a finite group $G$ is supersolvable if and only if all maximal chains of the identity in $G$ have the same length. As generalizations of Iwasawa's theorem, we provide some characterizations of the structure of a finite group $G$ in which all maximal chains of every minimal subgroup have the same length. Moreover, let $δ(G)$ be the number of subgroups of $G$ all of whose maximal chains in $G$ do not have the same length, we prove that $G$ is a non-solvable group with $δ(G)\leq 16$ if and only if $G\cong A_5$.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
OccFusion: Multi-Sensor Fusion Framework for 3D Semantic Occupancy Prediction
Authors:
Zhenxing Ming,
Julie Stephany Berrio,
Mao Shan,
Stewart Worrall
Abstract:
A comprehensive understanding of 3D scenes is crucial in autonomous vehicles (AVs), and recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, existing methods for 3D occupancy prediction heavily rely on surround-view camera images, making them susceptible to changes in lighting and weat…
▽ More
A comprehensive understanding of 3D scenes is crucial in autonomous vehicles (AVs), and recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, existing methods for 3D occupancy prediction heavily rely on surround-view camera images, making them susceptible to changes in lighting and weather conditions. This paper introduces OccFusion, a novel sensor fusion framework for predicting 3D occupancy. By integrating features from additional sensors, such as lidar and surround view radars, our framework enhances the accuracy and robustness of occupancy prediction, resulting in top-tier performance on the nuScenes benchmark. Furthermore, extensive experiments conducted on the nuScenes and semanticKITTI dataset, including challenging night and rainy scenarios, confirm the superior performance of our sensor fusion strategy across various perception ranges. The code for this framework will be made available at https://github.com/DanielMing123/OccFusion.
△ Less
Submitted 9 May, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction
Authors:
Zhenxing Ming,
Julie Stephany Berrio,
Mao Shan,
Stewart Worrall
Abstract:
This paper introduces InverseMatrixVT3D, an efficient method for transforming multi-view image features into 3D feature volumes for 3D semantic occupancy prediction. Existing methods for constructing 3D volumes often rely on depth estimation, device-specific operators, or transformer queries, which hinders the widespread adoption of 3D occupancy models. In contrast, our approach leverages two proj…
▽ More
This paper introduces InverseMatrixVT3D, an efficient method for transforming multi-view image features into 3D feature volumes for 3D semantic occupancy prediction. Existing methods for constructing 3D volumes often rely on depth estimation, device-specific operators, or transformer queries, which hinders the widespread adoption of 3D occupancy models. In contrast, our approach leverages two projection matrices to store the static map** relationships and matrix multiplications to efficiently generate global Bird's Eye View (BEV) features and local 3D feature volumes. Specifically, we achieve this by performing matrix multiplications between multi-view image feature maps and two sparse projection matrices. We introduce a sparse matrix handling technique for the projection matrices to optimize GPU memory usage. Moreover, a global-local attention fusion module is proposed to integrate the global BEV features with the local 3D feature volumes to obtain the final 3D volume. We also employ a multi-scale supervision mechanism to enhance performance further. Extensive experiments performed on the nuScenes and SemanticKITTI datasets reveal that our approach not only stands out for its simplicity and effectiveness but also achieves the top performance in detecting vulnerable road users (VRU), crucial for autonomous driving and road safety. The code has been made available at: https://github.com/DanielMing123/InverseMatrixVT3D
△ Less
Submitted 29 April, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
OmniMotionGPT: Animal Motion Generation with Limited Data
Authors:
Zhangsihao Yang,
Mingyuan Zhou,
Mengyi Shan,
Bingbing Wen,
Ziwei Xuan,
Mitch Hill,
Junjie Bai,
Guo-Jun Qi,
Yalin Wang
Abstract:
Our paper aims to generate diverse and realistic animal motion sequences from textual descriptions, without a large-scale animal text-motion dataset. While the task of text-driven human motion synthesis is already extensively studied and benchmarked, it remains challenging to transfer this success to other skeleton structures with limited data. In this work, we design a model architecture that imi…
▽ More
Our paper aims to generate diverse and realistic animal motion sequences from textual descriptions, without a large-scale animal text-motion dataset. While the task of text-driven human motion synthesis is already extensively studied and benchmarked, it remains challenging to transfer this success to other skeleton structures with limited data. In this work, we design a model architecture that imitates Generative Pretraining Transformer (GPT), utilizing prior knowledge learned from human data to the animal domain. We jointly train motion autoencoders for both animal and human motions and at the same time optimize through the similarity scores among human motion encoding, animal motion encoding, and text CLIP embedding. Presenting the first solution to this problem, we are able to generate animal motions with high diversity and fidelity, quantitatively and qualitatively outperforming the results of training human motion generation baselines on animal data. Additionally, we introduce AnimalML3D, the first text-animal motion dataset with 1240 animation sequences spanning 36 different animal identities. We hope this dataset would mediate the data scarcity problem in text-driven animal motion generation, providing a new playground for the research community.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Emergent spin-glass state in the doped Hund's metal CsFe2As2
Authors:
S. J. Li,
D. Zhao,
S. Wang,
S. T. Cui,
N. Z. Wang,
J. Li,
D. W. Song,
B. L. Kang,
L. X. Zheng,
L. P. Nie,
Z. M. Wu,
Y. B. Zhou,
M. Shan,
Z. Sun,
T. Wu,
X. H. Chen
Abstract:
Hund's metal is one kind of correlated metal, in which the electronic correlation is strongly influenced by the Hund's interaction. At high temperatures, while the charge and orbital degrees of freedom are quenched, the spin degrees of freedom can persist in terms of frozen moments. As temperature decreases, a coherent electronic state with characteristic orbital differentiation always emerges at…
▽ More
Hund's metal is one kind of correlated metal, in which the electronic correlation is strongly influenced by the Hund's interaction. At high temperatures, while the charge and orbital degrees of freedom are quenched, the spin degrees of freedom can persist in terms of frozen moments. As temperature decreases, a coherent electronic state with characteristic orbital differentiation always emerges at low temperatures through an incoherent-to-coherent crossover, which has been widely observed in iron-based superconductors (e.g., iron selenides and AFe2As2 (A = K, Rb, Cs)). Consequently, the above frozen moments are "screened" by coupling to orbital degrees of freedom, leading to an emergent Fermi-liquid state. In contrast, the coupling among frozen moments should impede the formation of the Fermi-liquid state by competitive magnetic ordering, which is still unexplored in Hund's metal. Here, in the iron-based Hund's metal CsFe2As2, we adopt a chemical substitution at iron sites by Cr/Co atoms to explore the competitive magnetic ordering. By a comprehensive study of resistivity, magnetic susceptibility, specific heat and nuclear magnetic resonance, we demonstrate that the Fermi-liquid state is destroyed in Cr-doped CsFe2As2 by a spinfreezing transition below T_g ~ 22 K. Meanwhile, the evolution of charge degrees of freedom measured by angle-resolved photoemission spectroscopy also supports the competition between the Fermi-liquid state and spin-glass state.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Magnetic-field-induced electronic instability of Weyl-like fermions in compressed black phosphorus
Authors:
Lixuan Zheng,
Kaifa Luo,
Zeliang Sun,
Dan Zhao,
Jian Li,
Dianwu Song,
Shunjiao Li,
Baolei Kang,
Linpeng Nie,
Min Shan,
Zhimian Wu,
Yanbing Zhou,
Xi Dai,
Hongming Weng,
Rui Yu,
Tao Wu,
Xianhui Chen
Abstract:
Revealing the role of Coulomb interaction in topological semimetals with Dirac/Weyl-like band dispersion shapes a new frontier in condensed matter physics. Topological node-line semimetals (TNLSMs), anticipated as a fertile ground for exploring electronic correlation effects due to the anisotropy associated with their node-line structure, have recently attracted considerable attention. In this stu…
▽ More
Revealing the role of Coulomb interaction in topological semimetals with Dirac/Weyl-like band dispersion shapes a new frontier in condensed matter physics. Topological node-line semimetals (TNLSMs), anticipated as a fertile ground for exploring electronic correlation effects due to the anisotropy associated with their node-line structure, have recently attracted considerable attention. In this study, we report an experimental observation for correlation effects in TNLSMs realized by black phosphorus (BP) under hydrostatic pressure. By performing a combination of nuclear magnetic resonance measurements and band calculations on compressed BP, a magnetic-field-induced electronic instability of Weyl-like fermions is identified under an external magnetic field parallel to the so-called nodal ring in the reciprocal space. Anomalous spin fluctuations serving as the fingerprint of electronic instability are observed at low temperatures, and they are observed to maximize at approximately 1.0 GPa. This study presents compressed BP as a realistic material platform for exploring the rich physics in strongly coupled Weyl-like fermions.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Classification of Safety Driver Attention During Autonomous Vehicle Operation
Authors:
Santiago Gerling Konrad,
Julie Stephany Berrio,
Mao Shan,
Favio Masson,
Stewart Worrall
Abstract:
Despite the continual advances in Advanced Driver Assistance Systems (ADAS) and the development of high-level autonomous vehicles (AV), there is a general consensus that for the short to medium term, there is a requirement for a human supervisor to handle the edge cases that inevitably arise. Given this requirement, it is essential that the state of the vehicle operator is monitored to ensure they…
▽ More
Despite the continual advances in Advanced Driver Assistance Systems (ADAS) and the development of high-level autonomous vehicles (AV), there is a general consensus that for the short to medium term, there is a requirement for a human supervisor to handle the edge cases that inevitably arise. Given this requirement, it is essential that the state of the vehicle operator is monitored to ensure they are contributing to the vehicle's safe operation. This paper introduces a dual-source approach integrating data from an infrared camera facing the vehicle operator and vehicle perception systems to produce a metric for driver alertness in order to promote and ensure safe operator behaviour. The infrared camera detects the driver's head, enabling the calculation of head orientation, which is relevant as the head typically moves according to the individual's focus of attention. By incorporating environmental data from the perception system, it becomes possible to determine whether the vehicle operator observes objects in the surroundings. Experiments were conducted using data collected in Sydney, Australia, simulating AV operations in an urban environment. Our results demonstrate that the proposed system effectively determines a metric for the attention levels of the vehicle operator, enabling interventions such as warnings or reducing autonomous functionality as appropriate. This comprehensive solution shows promise in contributing to ADAS and AVs' overall safety and efficiency in a real-world setting.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Animating Street View
Authors:
Mengyi Shan,
Brian Curless,
Ira Kemelmacher-Shlizerman,
Steve Seitz
Abstract:
We present a system that automatically brings street view imagery to life by populating it with naturally behaving, animated pedestrians and vehicles. Our approach is to remove existing people and vehicles from the input image, insert moving objects with proper scale, angle, motion, and appearance, plan paths and traffic behavior, as well as render the scene with plausible occlusion and shadowing…
▽ More
We present a system that automatically brings street view imagery to life by populating it with naturally behaving, animated pedestrians and vehicles. Our approach is to remove existing people and vehicles from the input image, insert moving objects with proper scale, angle, motion, and appearance, plan paths and traffic behavior, as well as render the scene with plausible occlusion and shadowing effects. The system achieves these by reconstructing the still image street scene, simulating crowd behavior, and rendering with consistent lighting, visibility, occlusions, and shadows. We demonstrate results on a diverse range of street scenes including regular still images and panoramas.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Real Effect or Bias? Best Practices for Evaluating the Robustness of Real-World Evidence through Quantitative Sensitivity Analysis for Unmeasured Confounding
Authors:
Douglas Faries,
Chenyin Gao,
Xiang Zhang,
Chad Hazlett,
James Stamey,
Shu Yang,
Peng Ding,
Mingyang Shan,
Kristin Sheffield,
Nancy Dreyer
Abstract:
The assumption of no unmeasured confounders is a critical but unverifiable assumption required for causal inference yet quantitative sensitivity analyses to assess robustness of real-world evidence remains underutilized. The lack of use is likely in part due to complexity of implementation and often specific and restrictive data requirements required for application of each method. With the advent…
▽ More
The assumption of no unmeasured confounders is a critical but unverifiable assumption required for causal inference yet quantitative sensitivity analyses to assess robustness of real-world evidence remains underutilized. The lack of use is likely in part due to complexity of implementation and often specific and restrictive data requirements required for application of each method. With the advent of sensitivity analyses methods that are broadly applicable in that they do not require identification of a specific unmeasured confounder, along with publicly available code for implementation, roadblocks toward broader use are decreasing. To spur greater application, here we present a best practice guidance to address the potential for unmeasured confounding at both the design and analysis stages, including a set of framing questions and an analytic toolbox for researchers. The questions at the design stage guide the research through steps evaluating the potential robustness of the design while encouraging gathering of additional data to reduce uncertainty due to potential confounding. At the analysis stage, the questions guide researchers to quantifying the robustness of the observed result and providing researchers with a clearer indication of the robustness of their conclusions. We demonstrate the application of the guidance using simulated data based on a real-world fibromyalgia study, applying multiple methods from our analytic toolbox for illustration purposes.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
MS3D++: Ensemble of Experts for Multi-Source Unsupervised Domain Adaptation in 3D Object Detection
Authors:
Darren Tsai,
Julie Stephany Berrio,
Mao Shan,
Eduardo Nebot,
Stewart Worrall
Abstract:
Deploying 3D detectors in unfamiliar domains has been demonstrated to result in a significant 70-90% drop in detection rate due to variations in lidar, geography, or weather from their training dataset. This domain gap leads to missing detections for densely observed objects, misaligned confidence scores, and increased high-confidence false positives, rendering the detector highly unreliable. To a…
▽ More
Deploying 3D detectors in unfamiliar domains has been demonstrated to result in a significant 70-90% drop in detection rate due to variations in lidar, geography, or weather from their training dataset. This domain gap leads to missing detections for densely observed objects, misaligned confidence scores, and increased high-confidence false positives, rendering the detector highly unreliable. To address this, we introduce MS3D++, a self-training framework for multi-source unsupervised domain adaptation in 3D object detection. MS3D++ generates high-quality pseudo-labels, allowing 3D detectors to achieve high performance on a range of lidar types, regardless of their density. Our approach effectively fuses predictions of an ensemble of multi-frame pre-trained detectors from different source domains to improve domain generalization. We subsequently refine predictions temporally to ensure temporal consistency in box localization and object classification. Furthermore, we present an in-depth study into the performance and idiosyncrasies of various 3D detector components in a cross-domain context, providing valuable insights for improved cross-domain detector ensembling. Experimental results on Waymo, nuScenes and Lyft demonstrate that detectors trained with MS3D++ pseudo-labels achieve state-of-the-art performance, comparable to training with human-annotated labels in Bird's Eye View (BEV) evaluation for both low and high density lidar. Code is available at https://github.com/darrenjkt/MS3D
△ Less
Submitted 4 September, 2023; v1 submitted 11 August, 2023;
originally announced August 2023.
-
Bayesian Record Linkage with Variables in One File
Authors:
Gauri Kamat,
Mingyang Shan,
Roee Gutman
Abstract:
In many healthcare and social science applications, information about units is dispersed across multiple data files. Linking records across files is necessary to estimate the associations of interest. Common record linkage algorithms only rely on similarities between linking variables that appear in all the files. Moreover, analysis of linked files often ignores errors that may arise from incorrec…
▽ More
In many healthcare and social science applications, information about units is dispersed across multiple data files. Linking records across files is necessary to estimate the associations of interest. Common record linkage algorithms only rely on similarities between linking variables that appear in all the files. Moreover, analysis of linked files often ignores errors that may arise from incorrect or missed links. Bayesian record linking methods allow for natural propagation of linkage error, by jointly sampling the linkage structure and the model parameters. We extend an existing Bayesian record linkage method to integrate associations between variables exclusive to each file being linked. We show analytically, and using simulations, that this method can improve the linking process, and can yield accurate inferences. We apply the method to link Meals on Wheels recipients to Medicare Enrollment records.
△ Less
Submitted 30 August, 2023; v1 submitted 10 August, 2023;
originally announced August 2023.
-
A Transfer Learning Framework for Proactive Ramp Metering Performance Assessment
Authors:
Xiaobo Ma,
Adrian Cottam,
Mohammad Razaur Rahman Shaon,
Yao-Jan Wu
Abstract:
Transportation agencies need to assess ramp metering performance when deploying or expanding a ramp metering system. The evaluation of a ramp metering strategy is primarily centered around examining its impact on freeway traffic mobility. One way these effects can be explored is by comparing traffic states, such as the speed before and after the ramp metering strategy has been altered. Predicting…
▽ More
Transportation agencies need to assess ramp metering performance when deploying or expanding a ramp metering system. The evaluation of a ramp metering strategy is primarily centered around examining its impact on freeway traffic mobility. One way these effects can be explored is by comparing traffic states, such as the speed before and after the ramp metering strategy has been altered. Predicting freeway traffic states for the after scenarios following the implementation of a new ramp metering control strategy could offer valuable insights into the potential effectiveness of the target strategy. However, the use of machine learning methods in predicting the freeway traffic state for the after scenarios and evaluating the effectiveness of transportation policies or traffic control strategies such as ramp metering is somewhat limited in the current literature. To bridge the research gap, this study presents a framework for predicting freeway traffic parameters (speed, occupancy, and flow rate) for the after situations when a new ramp metering control strategy is implemented. By learning the association between the spatial-temporal features of traffic states in before and after situations for known freeway segments, the proposed framework can transfer this learning to predict the traffic parameters for new freeway segments. The proposed framework is built upon a transfer learning model. Experimental results show that the proposed framework is feasible for use as an alternative for predicting freeway traffic parameters to proactively evaluate ramp metering performance.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Sensitivity Analysis for Unmeasured Confounding in Medical Product Development and Evaluation Using Real World Evidence
Authors:
Peng Ding,
Yixin Fang,
Doug Faries,
Susan Gruber,
Hana Lee,
Joo-Yeon Lee,
Pallavi Mishra-Kalyani,
Mingyang Shan,
Mark van der Laan,
Shu Yang,
Xiang Zhang
Abstract:
The American Statistical Association Biopharmaceutical Section (ASA BIOP) working group on real-world evidence (RWE) has been making continuous, extended effort towards a goal of supporting and advancing regulatory science with respect to non-interventional, clinical studies intended to use real-world data for evidence generation for the purpose of medical product development and evaluation (i.e.,…
▽ More
The American Statistical Association Biopharmaceutical Section (ASA BIOP) working group on real-world evidence (RWE) has been making continuous, extended effort towards a goal of supporting and advancing regulatory science with respect to non-interventional, clinical studies intended to use real-world data for evidence generation for the purpose of medical product development and evaluation (i.e., RWE studies). In 2023, the working group published a manuscript delineating challenges and opportunities in constructing estimands for RWE studies following a framework in ICH E9(R1) guidance on estimand and sensitivity analysis. As a follow-up task, we describe the other issue in RWE studies, sensitivity analysis. Focusing on the issue of unmeasured confounding, we review availability and applicability of sensitivity analysis methods for different types unmeasured confounding. We discuss consideration on the choice and use of sensitivity analysis for RWE studies. Updated version of this article will present how findings from sensitivity analysis could support regulatory decision-making using a real example.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
LightFormer: An End-to-End Model for Intersection Right-of-Way Recognition Using Traffic Light Signals and an Attention Mechanism
Authors:
Zhenxing Ming,
Julie Stephany Berrio,
Mao Shan,
Eduardo Nebot,
Stewart Worrall
Abstract:
For smart vehicles driving through signalised intersections, it is crucial to determine whether the vehicle has right of way given the state of the traffic lights. To address this issue, camera based sensors can be used to determine whether the vehicle has permission to proceed straight, turn left or turn right. This paper proposes a novel end to end intersection right of way recognition model cal…
▽ More
For smart vehicles driving through signalised intersections, it is crucial to determine whether the vehicle has right of way given the state of the traffic lights. To address this issue, camera based sensors can be used to determine whether the vehicle has permission to proceed straight, turn left or turn right. This paper proposes a novel end to end intersection right of way recognition model called LightFormer to generate right of way status for available driving directions in complex urban intersections. The model includes a spatial temporal inner structure with an attention mechanism, which incorporates features from past image to contribute to the classification of the current frame right of way status. In addition, a modified, multi weight arcface loss is introduced to enhance the model classification performance. Finally, the proposed LightFormer is trained and tested on two public traffic light datasets with manually augmented labels to demonstrate its effectiveness.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection
Authors:
Minh-Quan Dao,
Julie Stephany Berrio,
Vincent Frémont,
Mao Shan,
Elwan Héry,
Stewart Worrall
Abstract:
Occlusion is a major challenge for LiDAR-based object detection methods. This challenge becomes safety-critical in urban traffic where the ego vehicle must have reliable object detection to avoid collision while its field of view is severely reduced due to the obstruction posed by a large number of road users. Collaborative perception via Vehicle-to-Everything (V2X) communication, which leverages…
▽ More
Occlusion is a major challenge for LiDAR-based object detection methods. This challenge becomes safety-critical in urban traffic where the ego vehicle must have reliable object detection to avoid collision while its field of view is severely reduced due to the obstruction posed by a large number of road users. Collaborative perception via Vehicle-to-Everything (V2X) communication, which leverages the diverse perspective thanks to the presence at multiple locations of connected agents to form a complete scene representation, is an appealing solution. State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach where the Bird-Eye View images of point clouds are exchanged so that the bandwidth consumption is lower than communicating point clouds as in early collaboration, and the detection performance is higher than late collaboration, which fuses agents' output, thanks to a deeper interaction among connected agents. While achieving strong performance, the real-world deployment of most mid-collaboration approaches is hindered by their overly complicated architectures, involving learnable collaboration graphs and autoencoder-based compressor/ decompressor, and unrealistic assumptions about inter-agent synchronization. In this work, we devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior state-of-the-art methods while minimizing changes made to the single-vehicle detection models and relaxing unrealistic assumptions on inter-agent synchronization. Experiments on the V2X-Sim dataset show that our collaboration method achieves 98\% of the performance of an early-collaboration method, while only consuming the equivalent bandwidth of a late-collaboration method.
△ Less
Submitted 19 September, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Integrating Randomized Placebo-Controlled Trial Data with External Controls: A Semiparametric Approach with Selective Borrowing
Authors:
Chenyin Gao,
Shu Yang,
Mingyang Shan,
Wenyu Ye,
Ilya Lipkovich,
Douglas Faries
Abstract:
In recent years, real-world external controls (ECs) have grown in popularity as a tool to empower randomized placebo-controlled trials (RPCTs), particularly in rare diseases or cases where balanced randomization is unethical or impractical. However, as ECs are not always comparable to the RPCTs, direct borrowing ECs without scrutiny may heavily bias the treatment effect estimator. Our paper propos…
▽ More
In recent years, real-world external controls (ECs) have grown in popularity as a tool to empower randomized placebo-controlled trials (RPCTs), particularly in rare diseases or cases where balanced randomization is unethical or impractical. However, as ECs are not always comparable to the RPCTs, direct borrowing ECs without scrutiny may heavily bias the treatment effect estimator. Our paper proposes a data-adaptive integrative framework capable of preventing unknown biases of ECs. The adaptive nature is achieved by dynamically sorting out a set of comparable ECs via bias penalization. Our proposed method can simultaneously achieve (a) the semiparametric efficiency bound when the ECs are comparable and (b) selective borrowing that mitigates the impact of the existence of incomparable ECs. Furthermore, we establish statistical guarantees, including consistency, asymptotic distribution, and inference, providing type-I error control and good power. Extensive simulations and two real-data applications show that the proposed method leads to improved performance over the RPCT-only estimator across various bias-generating scenarios.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
MS3D: Leveraging Multiple Detectors for Unsupervised Domain Adaptation in 3D Object Detection
Authors:
Darren Tsai,
Julie Stephany Berrio,
Mao Shan,
Eduardo Nebot,
Stewart Worrall
Abstract:
We introduce Multi-Source 3D (MS3D), a new self-training pipeline for unsupervised domain adaptation in 3D object detection. Despite the remarkable accuracy of 3D detectors, they often overfit to specific domain biases, leading to suboptimal performance in various sensor setups and environments. Existing methods typically focus on adapting a single detector to the target domain, overlooking the fa…
▽ More
We introduce Multi-Source 3D (MS3D), a new self-training pipeline for unsupervised domain adaptation in 3D object detection. Despite the remarkable accuracy of 3D detectors, they often overfit to specific domain biases, leading to suboptimal performance in various sensor setups and environments. Existing methods typically focus on adapting a single detector to the target domain, overlooking the fact that different detectors possess distinct expertise on different unseen domains. MS3D leverages this by combining different pre-trained detectors from multiple source domains and incorporating temporal information to produce high-quality pseudo-labels for fine-tuning. Our proposed Kernel-Density Estimation (KDE) Box Fusion method fuses box proposals from multiple domains to obtain pseudo-labels that surpass the performance of the best source domain detectors. MS3D exhibits greater robustness to domain shift and produces accurate pseudo-labels over greater distances, making it well-suited for high-to-low beam domain adaptation and vice versa. Our method achieved state-of-the-art performance on all evaluated datasets, and we demonstrate that the pre-trained detector's source dataset has minimal impact on the fine-tuned result, making MS3D suitable for real-world applications.
△ Less
Submitted 8 May, 2023; v1 submitted 5 April, 2023;
originally announced April 2023.
-
Towards Real-Time Temporal Graph Learning
Authors:
Deniz Gurevin,
Mohsin Shan,
Tong Geng,
Weiwen Jiang,
Caiwen Ding,
Omer Khan
Abstract:
In recent years, graph representation learning has gained significant popularity, which aims to generate node embeddings that capture features of graphs. One of the methods to achieve this is employing a technique called random walks that captures node sequences in a graph and then learns embeddings for each node using a natural language processing technique called Word2Vec. These embeddings are t…
▽ More
In recent years, graph representation learning has gained significant popularity, which aims to generate node embeddings that capture features of graphs. One of the methods to achieve this is employing a technique called random walks that captures node sequences in a graph and then learns embeddings for each node using a natural language processing technique called Word2Vec. These embeddings are then used for deep learning on graph data for classification tasks, such as link prediction or node classification. Prior work operates on pre-collected temporal graph data and is not designed to handle updates on a graph in real-time. Real world graphs change dynamically and their entire temporal updates are not available upfront. In this paper, we propose an end-to-end graph learning pipeline that performs temporal graph construction, creates low-dimensional node embeddings, and trains multi-layer neural network models in an online setting. The training of the neural network models is identified as the main performance bottleneck as it performs repeated matrix operations on many sequentially connected low-dimensional kernels. We propose to unlock fine-grain parallelism in these low-dimensional kernels to boost performance of model training.
△ Less
Submitted 11 October, 2022; v1 submitted 8 October, 2022;
originally announced October 2022.
-
An Indirect Measurement of $^6$Li(n,$γ$) Cross Sections
Authors:
Midhun C. V,
M. M Musthafa,
S. V Suryanarayana,
Gokuldas H,
Shaima A,
Hajara. K,
Antony Joseph,
T. Santhosh,
A. Baishya,
A Pal,
P. C Rout,
S Santra,
P. T. M Shan,
Satheesh B,
B. V. John,
K. C Jagadeesan,
S. Ganesan
Abstract:
The $^6$Li(n,$γ$)$^7$Li cross sections in the neutron energy range of 0.6 to 4 MeV have been measured by the experimental implementation of the direct capture formalism. This was done by measuring the $γ$ transition probability experimentally and accounting for the spin factor by theoretical calculation. The electromagnetic transition probabilities from $^7$Li$^*$ analogous to the initial neutron…
▽ More
The $^6$Li(n,$γ$)$^7$Li cross sections in the neutron energy range of 0.6 to 4 MeV have been measured by the experimental implementation of the direct capture formalism. This was done by measuring the $γ$ transition probability experimentally and accounting for the spin factor by theoretical calculation. The electromagnetic transition probabilities from $^7$Li$^*$ analogous to the initial neutron capture states of $^6$Li$+n$ were measured by populating the J$_i$ states of $^7$Li through $^7$Li($p,p'$)$^7$Li$^*$ reaction. The impact of coupling of resonant states, above neutron separation threshold of $^7$Li, in the neutron capture, is observed from the capture $γ$ spectrum. The measured cross sections were reproduced through {\sc fresco} and Talys-1.95 Direct Capture Calculations.
△ Less
Submitted 25 September, 2022;
originally announced September 2022.
-
Emergent charge order and unconventional superconductivity in pressurized kagome superconductor CsV3Sb5
Authors:
Lixuan Zheng,
Zhimian Wu,
Ye Yang,
Linpeng Nie,
Min Shan,
Kuanglv Sun,
Dianwu Song,
Fanghang Yu,
Jian Li,
Dan Zhao,
Shunjiao Li,
Baolei Kang,
Yanbing Zhou,
Kai Liu,
Ziji Xiang,
Jianjun Ying,
Zhenyu Wang,
Tao Wu,
Xianhui Chen
Abstract:
The discovery of multiple electronic orders in kagome superconductors AV3Sb5 (A = K, Rb, Cs) provides a promising platform for exploring unprecedented emergent physics. Under moderate pressure (< 2.2 GPa), the triple-Q charge density wave (CDW) order is monotonically suppressed by pressure, while the superconductivity displays a two-dome-like behavior, suggesting an unusual interplay between super…
▽ More
The discovery of multiple electronic orders in kagome superconductors AV3Sb5 (A = K, Rb, Cs) provides a promising platform for exploring unprecedented emergent physics. Under moderate pressure (< 2.2 GPa), the triple-Q charge density wave (CDW) order is monotonically suppressed by pressure, while the superconductivity displays a two-dome-like behavior, suggesting an unusual interplay between superconductivity and CDW order. Given that time-reversal symmetry breaking and electronic nematicity have been revealed inside the triple-Q CDW phase, understanding this CDW order and its interplay with superconductivity becomes one of the core questions in AV3Sb5. Here, we report the evolution of CDW and superconductivity with pressure in CsV3Sb5 by 51V nuclear magnetic resonance measurements. An emergent CDW phase, ascribed to a possible stripe-like CDW order with a unidirectional 4a0 modulation, is observed between Pc1 ~ 0.58 GPa and Pc2 ~ 2.0 GPa, which explains the two-dome-like superconducting behavior under pressure. Furthermore, the nuclear spin-lattice relaxation measurement reveals evidence for pressure-independent charge fluctuations above the CDW transition temperature and unconventional superconducting pairing above Pc2. Our results not only shed new light on the interplay of superconductivity and CDW but also reveal novel electronic correlation effects in kagome superconductors AV3Sb5.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Viewer-Centred Surface Completion for Unsupervised Domain Adaptation in 3D Object Detection
Authors:
Darren Tsai,
Julie Stephany Berrio,
Mao Shan,
Eduardo Nebot,
Stewart Worrall
Abstract:
Every autonomous driving dataset has a different configuration of sensors, originating from distinct geographic regions and covering various scenarios. As a result, 3D detectors tend to overfit the datasets they are trained on. This causes a drastic decrease in accuracy when the detectors are trained on one dataset and tested on another. We observe that lidar scan pattern differences form a large…
▽ More
Every autonomous driving dataset has a different configuration of sensors, originating from distinct geographic regions and covering various scenarios. As a result, 3D detectors tend to overfit the datasets they are trained on. This causes a drastic decrease in accuracy when the detectors are trained on one dataset and tested on another. We observe that lidar scan pattern differences form a large component of this reduction in performance. We address this in our approach, SEE-VCN, by designing a novel viewer-centred surface completion network (VCN) to complete the surfaces of objects of interest within an unsupervised domain adaptation framework, SEE. With SEE-VCN, we obtain a unified representation of objects across datasets, allowing the network to focus on learning geometry, rather than overfitting on scan patterns. By adopting a domain-invariant representation, SEE-VCN can be classed as a multi-target domain adaptation approach where no annotations or re-training is required to obtain 3D detections for new scan patterns. Through extensive experiments, we show that our approach outperforms previous domain adaptation methods in multiple domain adaptation settings. Our code and data are available at https://github.com/darrenjkt/SEE-VCN.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
A Novel Probabilistic V2X Data Fusion Framework for Cooperative Perception
Authors:
Mao Shan,
Karan Narula,
Stewart Worrall,
Yung Fei Wong,
Julie Stephany Berrio Perez,
Paul Gray,
Eduardo Nebot
Abstract:
The paper addresses the vehicle-to-X (V2X) data fusion for cooperative or collective perception (CP). This emerging and promising intelligent transportation systems (ITS) technology has enormous potential for improving efficiency and safety of road transportation. Recent advances in V2X communication primarily address the definition of V2X messages and data dissemination amongst ITS stations (ITS-…
▽ More
The paper addresses the vehicle-to-X (V2X) data fusion for cooperative or collective perception (CP). This emerging and promising intelligent transportation systems (ITS) technology has enormous potential for improving efficiency and safety of road transportation. Recent advances in V2X communication primarily address the definition of V2X messages and data dissemination amongst ITS stations (ITS-Ss) in a traffic environment. Yet, a largely unsolved problem is how a connected vehicle (CV) can efficiently and consistently fuse its local perception information with the data received from other ITS-Ss. In this paper, we present a novel data fusion framework to fuse the local and V2X perception data for CP that considers the presence of cross-correlation. The proposed approach is validated through comprehensive results obtained from numerical simulation, CARLA simulation, and real-world experimentation that incorporates V2X-enabled intelligent platforms. The real-world experiment includes a CV, a connected and automated vehicle (CAV), and an intelligent roadside unit (IRSU) retrofitted with vision and lidar sensors. We also demonstrate how the fused CP information can improve the awareness of vulnerable road users (VRU) for CV/CAV, and how this information can be considered in path planning/decision making within the CAV to facilitate safe interactions.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation
Authors:
Roy Or-El,
Xuan Luo,
Mengyi Shan,
Eli Shechtman,
Jeong Joon Park,
Ira Kemelmacher-Shlizerman
Abstract:
We introduce a high resolution, 3D-consistent image and shape generation technique which we call StyleSDF. Our method is trained on single-view RGB data only, and stands on the shoulders of StyleGAN2 for image generation, while solving two main challenges in 3D-aware GANs: 1) high-resolution, view-consistent generation of the RGB images, and 2) detailed 3D shape. We achieve this by merging a SDF-b…
▽ More
We introduce a high resolution, 3D-consistent image and shape generation technique which we call StyleSDF. Our method is trained on single-view RGB data only, and stands on the shoulders of StyleGAN2 for image generation, while solving two main challenges in 3D-aware GANs: 1) high-resolution, view-consistent generation of the RGB images, and 2) detailed 3D shape. We achieve this by merging a SDF-based 3D representation with a style-based 2D generator. Our 3D implicit network renders low-resolution feature maps, from which the style-based network generates view-consistent, 1024x1024 images. Notably, our SDF-based 3D modeling defines detailed 3D surfaces, leading to consistent volume rendering. Our method shows higher quality results compared to state of the art in terms of visual and geometric quality.
△ Less
Submitted 29 March, 2022; v1 submitted 21 December, 2021;
originally announced December 2021.
-
Using principal stratification in analysis of clinical trials
Authors:
Ilya Lipkovich,
Bohdana Ratitch,
Yongming Qu,
Xiang Zhang,
Mingyang Shan,
Craig Mallinckrodt
Abstract:
The ICH E9(R1) addendum (2019) proposed principal stratification (PS) as one of five strategies for dealing with intercurrent events. Therefore, understanding the strengths, limitations, and assumptions of PS is important for the broad community of clinical trialists. Many approaches have been developed under the general framework of PS in different areas of research, including experimental and ob…
▽ More
The ICH E9(R1) addendum (2019) proposed principal stratification (PS) as one of five strategies for dealing with intercurrent events. Therefore, understanding the strengths, limitations, and assumptions of PS is important for the broad community of clinical trialists. Many approaches have been developed under the general framework of PS in different areas of research, including experimental and observational studies. These diverse applications have utilized a diverse set of tools and assumptions. Thus, need exists to present these approaches in a unifying manner. The goal of this tutorial is threefold. First, we provide a coherent and unifying description of PS. Second, we emphasize that estimation of effects within PS relies on strong assumptions and we thoroughly examine the consequences of these assumptions to understand in which situations certain assumptions are reasonable. Finally, we provide an overview of a variety of key methods for PS analysis and use a real clinical trial example to illustrate them. Examples of code for implementation of some of these approaches are given in supplemental materials.
△ Less
Submitted 22 October, 2022; v1 submitted 6 December, 2021;
originally announced December 2021.
-
See Eye to Eye: A Lidar-Agnostic 3D Detection Framework for Unsupervised Multi-Target Domain Adaptation
Authors:
Darren Tsai,
Julie Stephany Berrio,
Mao Shan,
Stewart Worrall,
Eduardo Nebot
Abstract:
Sampling discrepancies between different manufacturers and models of lidar sensors result in inconsistent representations of objects. This leads to performance degradation when 3D detectors trained for one lidar are tested on other types of lidars. Remarkable progress in lidar manufacturing has brought about advances in mechanical, solid-state, and recently, adjustable scan pattern lidars. For the…
▽ More
Sampling discrepancies between different manufacturers and models of lidar sensors result in inconsistent representations of objects. This leads to performance degradation when 3D detectors trained for one lidar are tested on other types of lidars. Remarkable progress in lidar manufacturing has brought about advances in mechanical, solid-state, and recently, adjustable scan pattern lidars. For the latter, existing works often require fine-tuning the model each time scan patterns are adjusted, which is infeasible. We explicitly deal with the sampling discrepancy by proposing a novel unsupervised multi-target domain adaptation framework, SEE, for transferring the performance of state-of-the-art 3D detectors across both fixed and flexible scan pattern lidars without requiring fine-tuning of models by end-users. Our approach interpolates the underlying geometry and normalizes the scan pattern of objects from different lidars before passing them to the detection network. We demonstrate the effectiveness of SEE on public datasets, achieving state-of-the-art results, and additionally provide quantitative results on a novel high-resolution lidar to prove the industry applications of our framework.
△ Less
Submitted 10 April, 2023; v1 submitted 17 November, 2021;
originally announced November 2021.
-
Impact of $^7$Be breakup on $^7$Li(p,n) Neutron Spectrum
Authors:
Midhun C. V,
M. M Musthafa,
S. V Suryanarayana,
T. Santhosh,
A. Baishya,
P. Patil,
A Pal,
P. C Rout,
S Santra,
R. Kujur,
Antony Joseph,
Shaima A,
Hajara. K,
P. T. M Shan,
Satheesh B,
Y. Sawant,
B. V. John,
E. T Mirgule,
K. C Jagadeesan,
S. Ganesan
Abstract:
The formation of continuum neutron distribution in $^7$Li(p,n) has been identified as due to the coupling of the $^7$Be breakup levels to the final state of the reaction. The continuum neutron spectra produced by $^7$Li(p,n) reaction has been estimated by measuring the double differential cross sections for continuum and resonant breakup of $^7$Be, through $^7$Li(p,n)$^7$Be$^*$ reaction at 21 MeV…
▽ More
The formation of continuum neutron distribution in $^7$Li(p,n) has been identified as due to the coupling of the $^7$Be breakup levels to the final state of the reaction. The continuum neutron spectra produced by $^7$Li(p,n) reaction has been estimated by measuring the double differential cross sections for continuum and resonant breakup of $^7$Be, through $^7$Li(p,n)$^7$Be$^*$ reaction at 21 MeV of proton energy. The breakup contributions from continuum and $5/2^-$, $7/2^-$ states of $^7$Be have been identified. The measured double differential cross sections have been reproduced through CDCC-CRC calculations. The cross sections were projected to neutron spectrum using Monte-Carlo approach and validated using experimentally measured $^3$He gated neutron spectra. $^7$Li(p,n) neutron spectrum at 20 MeV incident proton energy measured by McNaughton. et al. has been reproduced by adapting estimated model parameters for the reaction.
△ Less
Submitted 3 August, 2021;
originally announced August 2021.
-
ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description
Authors:
Mo Shan,
Qiaojun Feng,
You-Yi Jau,
Nikolay Atanasov
Abstract:
Autonomous systems need to understand the semantics and geometry of their surroundings in order to comprehend and safely execute object-level task specifications. This paper proposes an expressive yet compact model for joint object pose and shape optimization, and an associated optimization algorithm to infer an object-level map from multi-view RGB-D camera observations. The model is expressive be…
▽ More
Autonomous systems need to understand the semantics and geometry of their surroundings in order to comprehend and safely execute object-level task specifications. This paper proposes an expressive yet compact model for joint object pose and shape optimization, and an associated optimization algorithm to infer an object-level map from multi-view RGB-D camera observations. The model is expressive because it captures the identities, positions, orientations, and shapes of objects in the environment. It is compact because it relies on a low-dimensional latent representation of implicit object shape, allowing onboard storage of large multi-category object maps. Different from other works that rely on a single object representation format, our approach has a bi-level object model that captures both the coarse level scale as well as the fine level shape details. Our approach is evaluated on the large-scale real-world ScanNet dataset and compared against state-of-the-art methods.
△ Less
Submitted 31 July, 2021;
originally announced August 2021.
-
Intrinsic Spin Susceptibility and Pseudogap-like Behavior in Infinite-Layer LaNiO2
Authors:
D. Zhao,
Y. B. Zhou,
Y. Fu,
L. Wang,
X. F. Zhou,
H. Cheng,
J. Li,
D. W. Song,
S. J. Li,
B. L. Kang,
L. X. Zheng,
L. P. Nie,
Z. M. Wu,
M. Shan,
F. H. Yu,
J. J. Ying,
S. M. Wang,
J. W. Mei,
T. Wu,
X. H. Chen
Abstract:
The recent discovery of superconductivity in doped infinite-layer nickelates has stimulated intensive interest, especially for similarities and differences compared to that in cuprate superconductors. In contrast to cuprates, although earlier magnetization measurement reveals a Curie-Weiss-like behavior in undoped infinite-layer nickelates, there is no magnetic ordering observed by elastic neutron…
▽ More
The recent discovery of superconductivity in doped infinite-layer nickelates has stimulated intensive interest, especially for similarities and differences compared to that in cuprate superconductors. In contrast to cuprates, although earlier magnetization measurement reveals a Curie-Weiss-like behavior in undoped infinite-layer nickelates, there is no magnetic ordering observed by elastic neutron scattering down to liquid helium temperature. Until now, the nature of the magnetic ground state in undoped infinite-layer nickelates was still elusive. Here, we perform a nuclear magnetic resonance (NMR) experiment through 139La nuclei to study the intrinsic spin susceptibility of infinite-layer LaNiO2. First, the signature for magnetic ordering or freezing is absent in the 139La NMR spectrum down to 0.24 K, which unambiguously confirms a paramagnetic ground state in LaNiO2. Second, a pseudogap-like behavior instead of Curie-Weiss-like behavior is observed in both the temperature-dependent Knight shift and nuclear spin-lattice relaxation rate (1/T1), which is widely observed in both underdoped cuprates and iron-based superconductors. Furthermore, the scaling behavior between the Knight shift and 1/T1T has also been discussed. Finally, the present results imply a considerable exchange interaction in infinite-layer nickelates, which sets a strong constraint for the proposed theoretical models.
△ Less
Submitted 22 April, 2021;
originally announced April 2021.
-
Orbital ordering and fluctuations in a kagome superconductor CsV3Sb5
Authors:
D. W. Song,
L. X. Zheng,
F. H. Yu,
J. Li,
L. P. Nie,
M. Shan,
D. Zhao,
S. J. Li,
B. L. Kang,
Z. M. Wu,
Y. B. Zhou,
K. L. Sun,
K. Liu,
X. G. Luo,
Z. Y. Wang,
J. J. Ying,
X. G. Wan,
T. Wu,
X. H. Chen
Abstract:
Recently, competing electronic instabilities, including superconductivity and density-wave-like order, have been discovered in vanadium-based kagome metals AV3Sb5 (A = K, Rb, Cs) with a nontrivial band topology. This finding stimulates wide interests to study the interplay of these competing electronic orders and possible exotic excitations in the superconducting state. Here, in order to further c…
▽ More
Recently, competing electronic instabilities, including superconductivity and density-wave-like order, have been discovered in vanadium-based kagome metals AV3Sb5 (A = K, Rb, Cs) with a nontrivial band topology. This finding stimulates wide interests to study the interplay of these competing electronic orders and possible exotic excitations in the superconducting state. Here, in order to further clarify the nature of density-wave-like transition in these kagome superconductors, we performed 51V and 133Cs nuclear magnetic resonance (NMR) measurements on the CsV3Sb5 single crystal. A first-order phase transition associated with orbital ordering is revealed by observing a sudden splitting of orbital shift in 51V NMR spectrum at the structural transition temperature Ts ~ 94 K. In contrast, the quadrupole splitting from a charge-density-wave (CDW) order on 51V NMR spectrum only appears gradually below Ts with a typical second-order transition behavior, suggesting that the CDW order is a secondary electronic order. Moreover, combined with 133Cs NMR spectrum, the present result also confirms a three-dimensional structural modulation with a 2ax2ax2c period. Above Ts, the temperature-dependent Knight shift and nuclear spin-lattice relaxation rate (1/T1) further indicate the existence of remarkable magnetic fluctuations from vanadium 3d orbitals, which are suppressed due to orbital ordering below Ts. The present results strongly support that, besides CDW order, the previously claimed density-wave-like transition also involves a dominant orbital order, suggesting a rich orbital physics in these kagome superconductors.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
What is the appropriate speed for an autonomous vehicle? Designing a Pedestrian Aware Contextual Speed Controller
Authors:
Daniel Jiang,
Stewart Worrall,
Mao Shan
Abstract:
Social acceptance is a major hurdle for autonomous vehicle technology, central to which is ensuring both passengers and nearby pedestrians feel safe. This idea of `feeling safe' and perceived safety is highly subjective and rooted in human intuition. As such, traditional analytical approaches to autonomous navigation often fail to cater for the social expectations of individuals. Therefore, this p…
▽ More
Social acceptance is a major hurdle for autonomous vehicle technology, central to which is ensuring both passengers and nearby pedestrians feel safe. This idea of `feeling safe' and perceived safety is highly subjective and rooted in human intuition. As such, traditional analytical approaches to autonomous navigation often fail to cater for the social expectations of individuals. Therefore, this paper proposes an approach to capture the complexity of social expectations and integrate this complexity into a 3-layered Contextual Speed Controller. The layers were; the legal road speed limit, the socially acceptable speed given the number of nearby pedestrians, and the socially acceptable speed based on proximity to nearby pedestrians. An implementation of this layered approach was tested in areas of both low and high vehicle-pedestrian interactions. From the experiments conducted, the lower two layers were seen working in tandem to modulate the vehicle speed to appropriate levels that mimicked conservative human driver behaviour. In summary, this work quantified the relationship between pedestrian context and socially acceptable vehicle speeds, allowing for more perceivably safe autonomous driving. Furthermore, the need for different driving schemes for navigating different road environments was identified.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Optimising the selection of samples for robust lidar camera calibration
Authors:
Darren Tsai,
Stewart Worrall,
Mao Shan,
Anton Lohr,
Eduardo Nebot
Abstract:
We propose a robust calibration pipeline that optimises the selection of calibration samples for the estimation of calibration parameters that fit the entire scene. We minimise user error by automating the data selection process according to a metric, called Variability of Quality (VOQ) that gives a score to each calibration set of samples. We show that this VOQ score is correlated with the estima…
▽ More
We propose a robust calibration pipeline that optimises the selection of calibration samples for the estimation of calibration parameters that fit the entire scene. We minimise user error by automating the data selection process according to a metric, called Variability of Quality (VOQ) that gives a score to each calibration set of samples. We show that this VOQ score is correlated with the estimated calibration parameter's ability to generalise well to the entire scene, thereby overcoming the overfitting problems of existing calibration algorithms. Our approach has the benefits of simplifying the calibration process for practitioners of any calibration expertise level and providing an objective measure of the quality for our calibration pipeline's input and output data. We additionally use a novel method of assessing the accuracy of the calibration parameters. It involves computing reprojection errors for the entire scene to ensure that the parameters are well fitted to all features in the scene. Our proposed calibration pipeline takes 90s, and obtains an average reprojection error of 1-1.2cm, with standard deviation of 0.4-0.5cm over 46 poses evenly distributed in a scene. This process has been validated by experimentation on a high resolution, software definable lidar, Baraja Spectrum-Scan; and a low, fixed resolution lidar, Velodyne VLP-16. We have shown that despite the vast differences in lidar technologies, our proposed approach manages to estimate robust calibration parameters for both. Our code and data set used for this paper are made available as open-source.
△ Less
Submitted 22 September, 2021; v1 submitted 22 March, 2021;
originally announced March 2021.
-
Localization and Map** using Instance-specific Mesh Models
Authors:
Qiaojun Feng,
Yue Meng,
Mo Shan,
Nikolay Atanasov
Abstract:
This paper focuses on building semantic maps, containing object poses and shapes, using a monocular camera. This is an important problem because robots need rich understanding of geometry and context if they are to shape the future of transportation, construction, and agriculture. Our contribution is an instance-specific mesh model of object shape that can be optimized online based on semantic inf…
▽ More
This paper focuses on building semantic maps, containing object poses and shapes, using a monocular camera. This is an important problem because robots need rich understanding of geometry and context if they are to shape the future of transportation, construction, and agriculture. Our contribution is an instance-specific mesh model of object shape that can be optimized online based on semantic information extracted from camera images. Multi-view constraints on the object shape are obtained by detecting objects and extracting category-specific keypoints and segmentation masks. We show that the errors between projections of the mesh model and the observed keypoints and masks can be differentiated in order to obtain accurate instance-specific object shapes. We evaluate the performance of the proposed approach in simulation and on the KITTI dataset by building maps of car poses and shapes.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
Socially Aware Crowd Navigation with Multimodal Pedestrian Trajectory Prediction for Autonomous Vehicles
Authors:
Kunming Li,
Mao Shan,
Karan Narula,
Stewart Worrall,
Eduardo Nebot
Abstract:
Seamlessly operating an autonomous vehicle in a crowded pedestrian environment is a very challenging task. This is because human movement and interactions are very hard to predict in such environments. Recent work has demonstrated that reinforcement learning-based methods have the ability to learn to drive in crowds. However, these methods can have very poor performance due to inaccurate predictio…
▽ More
Seamlessly operating an autonomous vehicle in a crowded pedestrian environment is a very challenging task. This is because human movement and interactions are very hard to predict in such environments. Recent work has demonstrated that reinforcement learning-based methods have the ability to learn to drive in crowds. However, these methods can have very poor performance due to inaccurate predictions of the pedestrians' future state as human motion prediction has a large variance. To overcome this problem, we propose a new method, SARL-SGAN-KCE, that combines a deep socially aware attentive value network with a human multimodal trajectory prediction model to help identify the optimal driving policy. We also introduce a novel technique to extend the discrete action space with minimal additional computational requirements. The kinematic constraints of the vehicle are also considered to ensure smooth and safe trajectories. We evaluate our method against the state of art methods for crowd navigation and provide an ablation study to show that our method is safer and closer to human behaviour.
△ Less
Submitted 22 November, 2020;
originally announced November 2020.
-
Attentional-GCNN: Adaptive Pedestrian Trajectory Prediction towards Generic Autonomous Vehicle Use Cases
Authors:
Kunming Li,
Stuart Eiffert,
Mao Shan,
Francisco Gomez-Donoso,
Stewart Worrall,
Eduardo Nebot
Abstract:
Autonomous vehicle navigation in shared pedestrian environments requires the ability to predict future crowd motion both accurately and with minimal delay. Understanding the uncertainty of the prediction is also crucial. Most existing approaches however can only estimate uncertainty through repeated sampling of generative models. Additionally, most current predictive models are trained on datasets…
▽ More
Autonomous vehicle navigation in shared pedestrian environments requires the ability to predict future crowd motion both accurately and with minimal delay. Understanding the uncertainty of the prediction is also crucial. Most existing approaches however can only estimate uncertainty through repeated sampling of generative models. Additionally, most current predictive models are trained on datasets that assume complete observability of the crowd using an aerial view. These are generally not representative of real-world usage from a vehicle perspective, and can lead to the underestimation of uncertainty bounds when the on-board sensors are occluded. Inspired by prior work in motion prediction using spatio-temporal graphs, we propose a novel Graph Convolutional Neural Network (GCNN)-based approach, Attentional-GCNN, which aggregates information of implicit interaction between pedestrians in a crowd by assigning attention weight in edges of the graph. Our model can be trained to either output a probabilistic distribution or faster deterministic prediction, demonstrating applicability to autonomous vehicle use cases where either speed or accuracy with uncertainty bounds are required. To further improve the training of predictive models, we propose an automatically labelled pedestrian dataset collected from an intelligent vehicle platform representative of real-world use. Through experiments on a number of datasets, we show our proposed method achieves an improvement over the state of art by 10% Average Displacement Error (ADE) and 12% Final Displacement Error (FDE) with fast inference speeds.
△ Less
Submitted 22 November, 2020;
originally announced November 2020.
-
Demonstrations of Cooperative Perception: Safety and Robustness in Connected and Automated Vehicle Operations
Authors:
Mao Shan,
Karan Narula,
Yung Fei Wong,
Stewart Worrall,
Malik Khan,
Paul Alexander,
Eduardo Nebot
Abstract:
Cooperative perception, or collective perception (CP) is an emerging and promising technology for intelligent transportation systems (ITS). It enables an ITS station (ITS-S) to share its local perception information with others by means of vehicle-to-X (V2X) communication, thereby achieving improved efficiency and safety in road transportation. In this paper, we present our recent progress on the…
▽ More
Cooperative perception, or collective perception (CP) is an emerging and promising technology for intelligent transportation systems (ITS). It enables an ITS station (ITS-S) to share its local perception information with others by means of vehicle-to-X (V2X) communication, thereby achieving improved efficiency and safety in road transportation. In this paper, we present our recent progress on the development of a connected and automated vehicle (CAV) and intelligent roadside unit (IRSU). We present three different experiments to demonstrate the use of CP service within intelligent infrastructure to improve awareness of vulnerable road users (VRU) and thus safety for CAVs in various traffic scenarios. We demonstrate in the experiments that a connected vehicle (CV) can "see" a pedestrian around the corners. More importantly, we demonstrate how CAVs can autonomously and safely interact with walking and running pedestrians, relying only on the CP information from the IRSU through vehicle-to-infrastructure (V2I) communication. This is one of the first demonstrations of urban vehicle automation using only CP information. We also address in the paper the handling of collective perception messages (CPMs) received from the IRSU, and passing them through a pipeline of CP information coordinate transformation with uncertainty, multiple road user tracking, and eventually path planning/decision making within the CAV. The experimental results were obtained with manually driven CV, fully autonomous CAV, and an IRSU retrofitted with vision and laser sensors and a road user tracking system.
△ Less
Submitted 13 January, 2021; v1 submitted 17 November, 2020;
originally announced November 2020.
-
A Cross-Verification Approach for Protecting World Leaders from Fake and Tampered Audio
Authors:
Mengyi Shan,
TJ Tsai
Abstract:
This paper tackles the problem of verifying the authenticity of speech recordings from world leaders. Whereas previous work on detecting deep fake or tampered audio focus on scrutinizing an audio recording in isolation, we instead reframe the problem and focus on cross-verifying a questionable recording against trusted references. We present a method for cross-verifying a speech recording against…
▽ More
This paper tackles the problem of verifying the authenticity of speech recordings from world leaders. Whereas previous work on detecting deep fake or tampered audio focus on scrutinizing an audio recording in isolation, we instead reframe the problem and focus on cross-verifying a questionable recording against trusted references. We present a method for cross-verifying a speech recording against a reference that consists of two steps: aligning the two recordings and then classifying each query frame as matching or non-matching. We propose a subsequence alignment method based on the Needleman-Wunsch algorithm and show that it significantly outperforms dynamic time war** in handling common tampering operations. We also explore several binary classification models based on LSTM and Transformer architectures to verify content at the frame level. Through extensive experiments on tampered speech recordings of Donald Trump, we show that our system can reliably detect audio tampering operations of different types and durations. Our best model achieves 99.7% accuracy for the alignment task at an error tolerance of 50 ms and a 0.43% equal error rate in classifying audio frames as matching or non-matching.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Long-term map maintenance pipeline for autonomous vehicles
Authors:
Julie Stephany Berrio,
Stewart Worrall,
Mao Shan,
Eduardo Nebot
Abstract:
For autonomous vehicles to operate persistently in a typical urban environment, it is essential to have high accuracy position information. This requires a map** and localisation system that can adapt to changes over time. A localisation approach based on a single-survey map will not be suitable for long-term operation as it does not incorporate variations in the environment. In this paper, we p…
▽ More
For autonomous vehicles to operate persistently in a typical urban environment, it is essential to have high accuracy position information. This requires a map** and localisation system that can adapt to changes over time. A localisation approach based on a single-survey map will not be suitable for long-term operation as it does not incorporate variations in the environment. In this paper, we present new algorithms to maintain a featured-based map. A map maintenance pipeline is proposed that can continuously update a map with the most relevant features taking advantage of the changes in the surroundings. Our pipeline detects and removes transient features based on their geometrical relationships with the vehicle's pose. Newly identified features became part of a new feature map and are assessed by the pipeline as candidates for the localisation map. By purging out-of-date features and adding newly detected features, we continually update the prior map to more accurately represent the most recent environment. We have validated our approach using the USyd Campus Dataset, which includes more than 18 months of data. The results presented demonstrate that our maintenance pipeline produces a resilient map which can provide sustained localisation performance over time.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
Harnack's inequality for quasilinear elliptic equations with generalized Orlicz growth
Authors:
M. A. Shan,
I. I. Skrypnik,
M. V. Voitovych
Abstract:
We prove Harnack's inequality for bounded weak solutions to quasilinear second order elliptic equations with generalized Orlicz growth conditions. Our approach covers new cases of variable exponent and (p,q) growth conditions.
We prove Harnack's inequality for bounded weak solutions to quasilinear second order elliptic equations with generalized Orlicz growth conditions. Our approach covers new cases of variable exponent and (p,q) growth conditions.
△ Less
Submitted 9 August, 2020;
originally announced August 2020.
-
OrcVIO: Object residual constrained Visual-Inertial Odometry
Authors:
Mo Shan,
Vikas Dhiman,
Qiaojun Feng,
**zhao Li,
Nikolay Atanasov
Abstract:
Introducing object-level semantic information into simultaneous localization and map** (SLAM) system is critical. It not only improves the performance but also enables tasks specified in terms of meaningful objects. This work presents OrcVIO, for visual-inertial odometry tightly coupled with tracking and optimization over structured object models. OrcVIO differentiates through semantic feature a…
▽ More
Introducing object-level semantic information into simultaneous localization and map** (SLAM) system is critical. It not only improves the performance but also enables tasks specified in terms of meaningful objects. This work presents OrcVIO, for visual-inertial odometry tightly coupled with tracking and optimization over structured object models. OrcVIO differentiates through semantic feature and bounding-box reprojection errors to perform batch optimization over the pose and shape of objects. The estimated object states aid in real-time incremental optimization over the IMU-camera states. The ability of OrcVIO for accurate trajectory estimation and large-scale object-level map** is evaluated using real data.
△ Less
Submitted 29 May, 2021; v1 submitted 29 July, 2020;
originally announced July 2020.
-
Improved Handling of Repeats and Jumps in Audio-Sheet Image Synchronization
Authors:
Mengyi Shan,
TJ Tsai
Abstract:
This paper studies the problem of automatically generating piano score following videos given an audio recording and raw sheet music images. Whereas previous works focus on synthetic sheet music where the data has been cleaned and preprocessed, we instead focus on develo** a system that can cope with the messiness of raw, unprocessed sheet music PDFs from IMSLP. We investigate how well existing…
▽ More
This paper studies the problem of automatically generating piano score following videos given an audio recording and raw sheet music images. Whereas previous works focus on synthetic sheet music where the data has been cleaned and preprocessed, we instead focus on develo** a system that can cope with the messiness of raw, unprocessed sheet music PDFs from IMSLP. We investigate how well existing systems cope with real scanned sheet music, filler pages and unrelated pieces or movements, and discontinuities due to jumps and repeats. We find that a significant bottleneck in system performance is handling jumps and repeats correctly. In particular, we find that a previously proposed Jump DTW algorithm does not perform robustly when jump locations are unknown a priori. We propose a novel alignment algorithm called Hierarchical DTW that can handle jumps and repeats even when jump locations are not known. It first performs alignment at the feature level on each sheet music line, and then performs a second alignment at the segment level. By operating at the segment level, it is able to encode domain knowledge about how likely a particular jump is. Through carefully controlled experiments on unprocessed sheet music PDFs from IMSLP, we show that Hierarachical DTW significantly outperforms Jump DTW in handling various types of jumps.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.
-
Camera-Lidar Integration: Probabilistic sensor fusion for semantic map**
Authors:
Julie Stephany Berrio,
Mao Shan,
Stewart Worrall,
Eduardo Nebot
Abstract:
An automated vehicle operating in an urban environment must be able to perceive and recognise object/obstacles in a three-dimensional world while navigating in a constantly changing environment. In order to plan and execute accurate sophisticated driving maneuvers, a high-level contextual understanding of the surroundings is essential. Due to the recent progress in image processing, it is now poss…
▽ More
An automated vehicle operating in an urban environment must be able to perceive and recognise object/obstacles in a three-dimensional world while navigating in a constantly changing environment. In order to plan and execute accurate sophisticated driving maneuvers, a high-level contextual understanding of the surroundings is essential. Due to the recent progress in image processing, it is now possible to obtain high definition semantic information in 2D from monocular cameras, though cameras cannot reliably provide the highly accurate 3D information provided by lasers. The fusion of these two sensor modalities can overcome the shortcomings of each individual sensor, though there are a number of important challenges that need to be addressed in a probabilistic manner. In this paper, we address the common, yet challenging, lidar/camera/semantic fusion problems which are seldom approached in a wholly probabilistic manner. Our approach is capable of using a multi-sensor platform to build a three-dimensional semantic voxelized map that considers the uncertainty of all of the processes involved. We present a probabilistic pipeline that incorporates uncertainties from the sensor readings (cameras, lidar, IMU and wheel encoders), compensation for the motion of the vehicle, and heuristic label probabilities for the semantic images. We also present a novel and efficient viewpoint validation algorithm to check for occlusions from the camera frames. A probabilistic projection is performed from the camera images to the lidar point cloud. Each labelled lidar scan then feeds into an octree map building algorithm that updates the class probabilities of the map voxels every time a new observation is available. We validate our approach using a set of qualitative and quantitative experimental tests on the USyd Dataset.
△ Less
Submitted 9 July, 2020;
originally announced July 2020.
-
Probabilistic Crowd GAN: Multimodal Pedestrian Trajectory Prediction using a Graph Vehicle-Pedestrian Attention Network
Authors:
Stuart Eiffert,
Kunming Li,
Mao Shan,
Stewart Worrall,
Salah Sukkarieh,
Eduardo Nebot
Abstract:
Understanding and predicting the intention of pedestrians is essential to enable autonomous vehicles and mobile robots to navigate crowds. This problem becomes increasingly complex when we consider the uncertainty and multimodality of pedestrian motion, as well as the implicit interactions between members of a crowd, including any response to a vehicle. Our approach, Probabilistic Crowd GAN, exten…
▽ More
Understanding and predicting the intention of pedestrians is essential to enable autonomous vehicles and mobile robots to navigate crowds. This problem becomes increasingly complex when we consider the uncertainty and multimodality of pedestrian motion, as well as the implicit interactions between members of a crowd, including any response to a vehicle. Our approach, Probabilistic Crowd GAN, extends recent work in trajectory prediction, combining Recurrent Neural Networks (RNNs) with Mixture Density Networks (MDNs) to output probabilistic multimodal predictions, from which likely modal paths are found and used for adversarial training. We also propose the use of Graph Vehicle-Pedestrian Attention Network (GVAT), which models social interactions and allows input of a shared vehicle feature, showing that inclusion of this module leads to improved trajectory prediction both with and without the presence of a vehicle. Through evaluation on various datasets, we demonstrate improvements on the existing state of the art methods for trajectory prediction and illustrate how the true multimodal and uncertain nature of crowd interactions can be directly modelled.
△ Less
Submitted 12 July, 2020; v1 submitted 23 June, 2020;
originally announced June 2020.
-
A Bayesian Multi-Layered Record Linkage Procedure to Analyze Functional Status of Medicare Patients with Traumatic Brain Injury
Authors:
Mingyang Shan,
Kali Thomas,
Roee Gutman
Abstract:
Understanding the association between injury severity and patients' potential for recovery is crucial to providing better care for patients with traumatic brain injury (TBI). Estimation of this relationship requires clinical information on injury severity, patient demographics, and healthcare utilization, which are often obtained from separate data sources. Because of privacy and confidentiality r…
▽ More
Understanding the association between injury severity and patients' potential for recovery is crucial to providing better care for patients with traumatic brain injury (TBI). Estimation of this relationship requires clinical information on injury severity, patient demographics, and healthcare utilization, which are often obtained from separate data sources. Because of privacy and confidentiality regulations, these data sources do not include unique identifiers to link records across data sources. Record linkage is a process to identify records that represent the same entity across data sources in the absence of unique identifiers. These processes commonly rely on agreement between variables that appear in both data sources to link records. However, when the number of records in each file is large, this task is computationally intensive and may result in false links. Blocking is a data partitioning technique that reduces the number of possible links that should be considered. Healthcare providers can be used as blocks in applications of record linkage with healthcare datasets. However, providers may not be uniquely identified across files. We propose a Bayesian record linkage procedure that simultaneously performs block-level and record-level linkage. This iterative approach incorporates the record-level linkage within block pairs to improve the accuracy of the block-level linkage. Subsequently, the algorithm improves record-level linkage using the accurate partitioning of the linkage space through blocking. We demonstrate that our proposed method provides improved performance compared to existing Bayesian record linkage methods that do not incorporate blocking. The proposed procedure is then used to merge registry data from the National Trauma Data Bank with Medicare claims data to estimate the relationship between injury severity and TBI patients' recovery.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Using Cell Phone Pictures of Sheet Music To Retrieve MIDI Passages
Authors:
TJ Tsai,
Daniel Yang,
Mengyi Shan,
Thitaree Tanprasert,
Teerapat Jenrungrot
Abstract:
This article investigates a cross-modal retrieval problem in which a user would like to retrieve a passage of music from a MIDI file by taking a cell phone picture of several lines of sheet music. This problem is challenging for two reasons: it has a significant runtime constraint since it is a user-facing application, and there is very little relevant training data containing cell phone images of…
▽ More
This article investigates a cross-modal retrieval problem in which a user would like to retrieve a passage of music from a MIDI file by taking a cell phone picture of several lines of sheet music. This problem is challenging for two reasons: it has a significant runtime constraint since it is a user-facing application, and there is very little relevant training data containing cell phone images of sheet music. To solve this problem, we introduce a novel feature representation called a bootleg score which encodes the position of noteheads relative to staff lines in sheet music. The MIDI representation can be converted into a bootleg score using deterministic rules of Western musical notation, and the sheet music image can be converted into a bootleg score using classical computer vision techniques for detecting simple geometrical shapes. Once the MIDI and cell phone image have been converted into bootleg scores, we can estimate the alignment using dynamic programming. The most notable characteristic of our system is that it has no trainable weights at all -- only a set of about 40 hyperparameters. With a training set of just 400 images, we show that our system generalizes well to a much larger set of 1600 test images from 160 unseen musical scores. Our system achieves a test F measure score of 0.89, has an average runtime of 0.90 seconds, and outperforms baseline systems based on music object detection and sheet-audio alignment. We provide extensive experimental validation and analysis of our system.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
MIDI Passage Retrieval Using Cell Phone Pictures of Sheet Music
Authors:
Daniel Yang,
Thitaree Tanprasert,
Teerapat Jenrungrot,
Mengyi Shan,
TJ Tsai
Abstract:
This paper investigates a cross-modal retrieval problem in which a user would like to retrieve a passage of music from a MIDI file by taking a cell phone picture of a physical page of sheet music. While audio-sheet music retrieval has been explored by a number of works, this scenario is novel in that the query is a cell phone picture rather than a digital scan. To solve this problem, we introduce…
▽ More
This paper investigates a cross-modal retrieval problem in which a user would like to retrieve a passage of music from a MIDI file by taking a cell phone picture of a physical page of sheet music. While audio-sheet music retrieval has been explored by a number of works, this scenario is novel in that the query is a cell phone picture rather than a digital scan. To solve this problem, we introduce a mid-level feature representation called a bootleg score which explicitly encodes the rules of Western musical notation. We convert both the MIDI and the sheet music into bootleg scores using deterministic rules of music and classical computer vision techniques for detecting simple geometric shapes. Once the MIDI and cell phone image have been converted into bootleg scores, we estimate the alignment using dynamic programming. The most notable characteristic of our system is that it does test-time adaptation and has no trainable weights at all -- only a set of about 30 hyperparameters. On a dataset containing 1000 cell phone pictures taken of 100 scores of classical piano music, our system achieves an F measure score of .869 and outperforms baseline systems based on commercial optical music recognition software.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
Resonant Decompositions and Global Well-posedness for 2D Zakharov-Kuznetsov Equation in Sobolev spaces of Negative Indices
Authors:
Minjie Shan,
Baoxiang Wang,
Liqun Zhang
Abstract:
The Cauchy problem for Zakharov-Kuznetsov equation on $\mathbb{R}^2$ is shown to be global well-posed for the initial date in $H^{s}$ provided $s>-\frac{1}{13}$. As conservation laws are invalid in Sobolev spaces below $L^2$, we construct an almost conserved quantity using multilinear correction term following the $I$-method introduced by Colliander, Keel, Staffilani, Takaoka and Tao. In contrast…
▽ More
The Cauchy problem for Zakharov-Kuznetsov equation on $\mathbb{R}^2$ is shown to be global well-posed for the initial date in $H^{s}$ provided $s>-\frac{1}{13}$. As conservation laws are invalid in Sobolev spaces below $L^2$, we construct an almost conserved quantity using multilinear correction term following the $I$-method introduced by Colliander, Keel, Staffilani, Takaoka and Tao. In contrast to KdV equation, the main difficulty is to handle the resonant interactions which are significant due to the multidimensional and multilinear setting of the problem. The proof relies upon the bilinear Strichartz estimate and the nonlinear Loomis-Whitney inequality.
△ Less
Submitted 16 March, 2020;
originally announced March 2020.
-
Probabilistic Egocentric Motion Correction of Lidar Point Cloud and Projection to Camera Images for Moving Platforms
Authors:
Mao Shan,
Julie Stephany Berrio,
Stewart Worrall,
Eduardo Nebot
Abstract:
The fusion of sensor data from heterogeneous sensors is crucial for robust perception in various robotics applications that involve moving platforms, for instance, autonomous vehicle navigation. In particular, combining camera and lidar sensors enables the projection of precise range information of the surrounding environment onto visual images. It also makes it possible to label each lidar point…
▽ More
The fusion of sensor data from heterogeneous sensors is crucial for robust perception in various robotics applications that involve moving platforms, for instance, autonomous vehicle navigation. In particular, combining camera and lidar sensors enables the projection of precise range information of the surrounding environment onto visual images. It also makes it possible to label each lidar point with visual segmentation/classification results for 3D map**, which facilitates a higher level understanding of the scene. The task is however considered non-trivial due to intrinsic and extrinsic sensor calibration, and the distortion of lidar points resulting from the ego-motion of the platform. Despite the existence of many lidar ego-motion correction methods, the errors in the correction process due to uncertainty in ego-motion estimation are not possible to remove completely. It is thus essential to consider the problem a probabilistic process where the ego-motion estimation uncertainty is modelled and considered consistently. The paper investigates the probabilistic lidar ego-motion correction and lidar-to-camera projection, where both the uncertainty in the ego-motion estimation and time jitter in sensory measurements are incorporated. The proposed approach is validated both in simulation and using real-world data collected from an electric vehicle retrofitted with wide-angle cameras and a 16-beam scanning lidar.
△ Less
Submitted 9 March, 2020;
originally announced March 2020.
-
Semantic sensor fusion: from camera to sparse lidar information
Authors:
Julie Stephany Berrio,
Mao Shan,
Stewart Worrall,
James Ward,
Eduardo Nebot
Abstract:
To navigate through urban roads, an automated vehicle must be able to perceive and recognize objects in a three-dimensional environment. A high-level contextual understanding of the surroundings is necessary to plan and execute accurate driving maneuvers. This paper presents an approach to fuse different sensory information, Light Detection and Ranging (lidar) scans and camera images. The output o…
▽ More
To navigate through urban roads, an automated vehicle must be able to perceive and recognize objects in a three-dimensional environment. A high-level contextual understanding of the surroundings is necessary to plan and execute accurate driving maneuvers. This paper presents an approach to fuse different sensory information, Light Detection and Ranging (lidar) scans and camera images. The output of a convolutional neural network (CNN) is used as classifier to obtain the labels of the environment. The transference of semantic information between the labelled image and the lidar point cloud is performed in four steps: initially, we use heuristic methods to associate probabilities to all the semantic classes contained in the labelled images. Then, the lidar points are corrected to compensate for the vehicle's motion given the difference between the timestamps of each lidar scan and camera image. In a third step, we calculate the pixel coordinate for the corresponding camera image. In the last step we perform the transfer of semantic information from the heuristic probability images to the lidar frame, while removing the lidar information that is not visible to the camera. We tested our approach in the Usyd Dataset \cite{usyd_dataset}, obtaining qualitative and quantitative results that demonstrate the validity of our probabilistic sensory fusion approach.
△ Less
Submitted 3 March, 2020;
originally announced March 2020.