-
New MDS codes of non-GRS type and NMDS codes
Authors:
Yujie Zhi,
Shixin Zhu
Abstract:
Maximum distance separable (MDS) and near maximum distance separable (NMDS) codes have been widely used in various fields such as communication systems, data storage, and quantum codes due to their algebraic properties and excellent error-correcting capabilities. This paper focuses on a specific class of linear codes and establishes necessary and sufficient conditions for them to be MDS or NMDS. A…
▽ More
Maximum distance separable (MDS) and near maximum distance separable (NMDS) codes have been widely used in various fields such as communication systems, data storage, and quantum codes due to their algebraic properties and excellent error-correcting capabilities. This paper focuses on a specific class of linear codes and establishes necessary and sufficient conditions for them to be MDS or NMDS. Additionally, we employ the well-known Schur method to demonstrate that they are non-equivalent to generalized Reed-Solomon codes.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
SURESTEP: An Uncertainty-Aware Trajectory Optimization Framework to Enhance Visual Tool Tracking for Robust Surgical Automation
Authors:
Nikhil U. Shinde,
Zih-Yun Chiu,
Florian Richter,
Jason Lim,
Yuheng Zhi,
Sylvia Herbert,
Michael C. Yip
Abstract:
Inaccurate tool localization is one of the main reasons for failures in automating surgical tasks. Imprecise robot kinematics and noisy observations caused by the poor visual acuity of an endoscopic camera make tool tracking challenging. Previous works in surgical automation adopt environment-specific setups or hard-coded strategies instead of explicitly considering motion and observation uncertai…
▽ More
Inaccurate tool localization is one of the main reasons for failures in automating surgical tasks. Imprecise robot kinematics and noisy observations caused by the poor visual acuity of an endoscopic camera make tool tracking challenging. Previous works in surgical automation adopt environment-specific setups or hard-coded strategies instead of explicitly considering motion and observation uncertainty of tool tracking in their policies. In this work, we present SURESTEP, an uncertainty-aware trajectory optimization framework for robust surgical automation. We model the uncertainty of tool tracking with the components motivated by the sources of noise in typical surgical scenes. Using a Gaussian assumption to propagate our uncertainty models through a given tool trajectory, SURESTEP provides a general framework that minimizes the upper bound on the entropy of the final estimated tool distribution. We compare SURESTEP with a baseline method on a real-world suture needle regras** task under challenging environmental conditions, such as poor lighting and a moving endoscopic camera. The results over 60 regrasps on the da Vinci Research Kit (dVRK) demonstrate that our optimized trajectories significantly outperform the un-optimized baseline.
△ Less
Submitted 29 March, 2024;
originally announced April 2024.
-
GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond
Authors:
Chongjie Ye,
Yinyu Nie,
Jiahao Chang,
Yuantao Chen,
Yihao Zhi,
Xiaoguang Han
Abstract:
We present GauStudio, a novel modular framework for modeling 3D Gaussian Splatting (3DGS) to provide standardized, plug-and-play components for users to easily customize and implement a 3DGS pipeline. Supported by our framework, we propose a hybrid Gaussian representation with foreground and skyball background models. Experiments demonstrate this representation reduces artifacts in unbounded outdo…
▽ More
We present GauStudio, a novel modular framework for modeling 3D Gaussian Splatting (3DGS) to provide standardized, plug-and-play components for users to easily customize and implement a 3DGS pipeline. Supported by our framework, we propose a hybrid Gaussian representation with foreground and skyball background models. Experiments demonstrate this representation reduces artifacts in unbounded outdoor scenes and improves novel view synthesis. Finally, we propose Gaussian Splatting Surface Reconstruction (GauS), a novel render-then-fuse approach for high-fidelity mesh reconstruction from 3DGS inputs without fine-tuning. Overall, our GauStudio framework, hybrid representation, and GauS approach enhance 3DGS modeling and rendering capabilities, enabling higher-quality novel view synthesis and surface reconstruction.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Spontaneous and Explicit Spacetime Symmetry Breaking in Einstein-Cartan Theory with Background Fields
Authors:
Robert Bluhm,
Yu Zhi
Abstract:
Explicit and spontaneous breaking of spacetime symmetry under diffeomorphisms, local translations, and local Lorentz transformations due to the presence of fixed background fields is examined in Einstein-Cartan theory. In particular, the roles of torsion and violation of local translation invariance are highlighted. The nature of the types of background fields that can arise and how they cause spa…
▽ More
Explicit and spontaneous breaking of spacetime symmetry under diffeomorphisms, local translations, and local Lorentz transformations due to the presence of fixed background fields is examined in Einstein-Cartan theory. In particular, the roles of torsion and violation of local translation invariance are highlighted. The nature of the types of background fields that can arise and how they cause spacetime symmetry breaking is discussed. With explicit breaking, potential no-go results are known to exist, which if not evaded lead to inconsistencies between the Bianchi identities, Noether identities, and the equations of motion. These are examined in detail, and the effects of nondynamical backgrounds and explicit breaking on the energy-momentum tensor when torsion is present are discussed as well. Examples illustrating various features of both explicit and spontaneous breaking of local translations are presented and compared to the case of diffeomorphism breaking.
△ Less
Submitted 19 January, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Finding Biomechanically Safe Trajectories for Robot Manipulation of the Human Body in a Search and Rescue Scenario
Authors:
Elizabeth Peiros,
Zih-Yun Chiu,
Yuheng Zhi,
Nikhil Shinde,
Michael C. Yip
Abstract:
There has been increasing awareness of the difficulties in reaching and extracting people from mass casualty scenarios, such as those arising from natural disasters. While platforms have been designed to consider reaching casualties and even carrying them out of harm's way, the challenge of repositioning a casualty from its found configuration to one suitable for extraction has not been explicitly…
▽ More
There has been increasing awareness of the difficulties in reaching and extracting people from mass casualty scenarios, such as those arising from natural disasters. While platforms have been designed to consider reaching casualties and even carrying them out of harm's way, the challenge of repositioning a casualty from its found configuration to one suitable for extraction has not been explicitly explored. Furthermore, this planning problem needs to incorporate biomechanical safety considerations for the casualty. Thus, we present a first solution to biomechanically safe trajectory generation for repositioning limbs of unconscious human casualties. We describe biomechanical safety as mathematical constraints, mechanical descriptions of the dynamics for the robot-human coupled system, and the planning and trajectory optimization process that considers this coupled and constrained system. We finally evaluate our approach over several variations of the problem and demonstrate it on a real robot and human subject. This work provides a crucial part of search and rescue that can be used in conjunction with past and present works involving robots and vision systems designed for search and rescue.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation
Authors:
Yihao Zhi,
Xiaodong Cun,
Xuelin Chen,
Xi Shen,
Wen Guo,
Shaoli Huang,
Shenghua Gao
Abstract:
Gestures are non-verbal but important behaviors accompanying people's speech. While previous methods are able to generate speech rhythm-synchronized gestures, the semantic context of the speech is generally lacking in the gesticulations. Although semantic gestures do not occur very regularly in human speech, they are indeed the key for the audience to understand the speech context in a more immers…
▽ More
Gestures are non-verbal but important behaviors accompanying people's speech. While previous methods are able to generate speech rhythm-synchronized gestures, the semantic context of the speech is generally lacking in the gesticulations. Although semantic gestures do not occur very regularly in human speech, they are indeed the key for the audience to understand the speech context in a more immersive environment. Hence, we introduce LivelySpeaker, a framework that realizes semantics-aware co-speech gesture generation and offers several control handles. In particular, our method decouples the task into two stages: script-based gesture generation and audio-guided rhythm refinement. Specifically, the script-based gesture generation leverages the pre-trained CLIP text embeddings as the guidance for generating gestures that are highly semantically aligned with the script. Then, we devise a simple but effective diffusion-based gesture generation backbone simply using pure MLPs, that is conditioned on only audio signals and learns to gesticulate with realistic motions. We utilize such powerful prior to rhyme the script-guided gestures with the audio signals, notably in a zero-shot setting. Our novel two-stage generation framework also enables several applications, such as changing the gesticulation style, editing the co-speech gestures via textual prompting, and controlling the semantic awareness and rhythm alignment with guided diffusion. Extensive experiments demonstrate the advantages of the proposed framework over competing methods. In addition, our core diffusion-based generative model also achieves state-of-the-art performance on two benchmarks. The code and model will be released to facilitate future research.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Graph Classification Gaussian Processes via Spectral Features
Authors:
Felix L. Opolka,
Yin-Cong Zhi,
Pietro Liò,
Xiaowen Dong
Abstract:
Graph classification aims to categorise graphs based on their structure and node attributes. In this work, we propose to tackle this task using tools from graph signal processing by deriving spectral features, which we then use to design two variants of Gaussian process models for graph classification. The first variant uses spectral features based on the distribution of energy of a node feature s…
▽ More
Graph classification aims to categorise graphs based on their structure and node attributes. In this work, we propose to tackle this task using tools from graph signal processing by deriving spectral features, which we then use to design two variants of Gaussian process models for graph classification. The first variant uses spectral features based on the distribution of energy of a node feature signal over the spectrum of the graph. We show that even such a simple approach, having no learned parameters, can yield competitive performance compared to strong neural network and graph kernel baselines. A second, more sophisticated variant is designed to capture multi-scale and localised patterns in the graph by learning spectral graph wavelet filters, obtaining improved performance on synthetic and real-world data sets. Finally, we show that both models produce well calibrated uncertainty estimates, enabling reliable decision making based on the model predictions.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Centralised Design and Production of the Ultra-High Vacuum and Laser-Stabilisation Systems for the AION Ultra-Cold Strontium Laboratories
Authors:
B. Stray,
O. Ennis,
S. Hedges,
S. Dey,
M. Langlois,
K. Bongs,
S. Lellouch,
M. Holynski,
B. Bostwick,
J. Chen,
Z. Eyler,
V. Gibson,
T. L. Harte,
M. Hsu,
M. Karzazi,
J. Mitchell,
N. Mouelle,
U. Schneider,
Y. Tang,
K. Tkalcec,
Y. Zhi,
K. Clarke,
A. Vick,
K. Bridges,
J. Coleman
, et al. (47 additional authors not shown)
Abstract:
This paper outlines the centralised design and production of the Ultra-High-Vacuum sidearm and Laser-Stabilisation systems for the AION Ultra-Cold Strontium Laboratories. Commissioning data on the residual gas and steady-state pressures in the sidearm chambers, on magnetic field quality, on laser stabilisation, and on the loading rate for the 3D Magneto-Optical Trap are presented. Streamlining the…
▽ More
This paper outlines the centralised design and production of the Ultra-High-Vacuum sidearm and Laser-Stabilisation systems for the AION Ultra-Cold Strontium Laboratories. Commissioning data on the residual gas and steady-state pressures in the sidearm chambers, on magnetic field quality, on laser stabilisation, and on the loading rate for the 3D Magneto-Optical Trap are presented. Streamlining the design and production of the sidearm and laser stabilisation systems enabled the AION Collaboration to build and equip in parallel five state-of-the-art Ultra-Cold Strontium Laboratories within 24 months by leveraging key expertise in the collaboration. This approach could serve as a model for the development and construction of other cold atom experiments, such as atomic clock experiments and neutral atom quantum computing systems, by establishing dedicated design and production units at national laboratories.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Air cold atmospheric plasma with patterns for anaplastic squamous cell carcinoma treatment
Authors:
Fan Bai,
Yingjie Lu,
Yujie Zhi,
Yueye Huang,
Long Li,
Jiaoxiao Luo,
Jamoliddin Razzokov,
Olga Koval,
Maksudbek Yusupov,
Guojun Chen,
Zhitong Chen
Abstract:
In recent years, cold atmospheric plasma (CAP) using inert gas has been successfully applied for biomedicine, such as sterilization, wound healing, skin diseases, and tumor treatment. Here, we reported air cold atmospheric plasma with three different patterns (I. Non: basic square grid structure; II. Square: basic square grid structure + square node; III. Circle: basic square grid structure + circ…
▽ More
In recent years, cold atmospheric plasma (CAP) using inert gas has been successfully applied for biomedicine, such as sterilization, wound healing, skin diseases, and tumor treatment. Here, we reported air cold atmospheric plasma with three different patterns (I. Non: basic square grid structure; II. Square: basic square grid structure + square node; III. Circle: basic square grid structure + circle node) for anaplastic squamous cell carcinoma treatment (VX2 cell line). Various plasma diagnostic techniques were applied to evaluate the physics of air CAP with patterns such as discharge voltage, plasma initial generating process, plasma temperature, and optical emission spectroscopy (OES). The direct effects of air CAP with patterns on anaplastic squamous cell carcinoma treatment (VX2 cell line) were investigated in vitro. We also studied the ROS (reactive oxygen species) and RNS (reactive nitrogen species) generation in cultured media released from VX2 cells after the treatment of air CAP with patterns. The results showed that the air CAP with circle-pattern generated more active substances during at 60s treatment time, which resulted in a higher death rate of VX2 cells. These initial observations establish the air CAP with patterns as potential clinical applications for cancer therapy.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
SemHint-MD: Learning from Noisy Semantic Labels for Self-Supervised Monocular Depth Estimation
Authors:
Shan Lin,
Yuheng Zhi,
Michael C. Yip
Abstract:
Without ground truth supervision, self-supervised depth estimation can be trapped in a local minimum due to the gradient-locality issue of the photometric loss. In this paper, we present a framework to enhance depth by leveraging semantic segmentation to guide the network to jump out of the local minimum. Prior works have proposed to share encoders between these two tasks or explicitly align them…
▽ More
Without ground truth supervision, self-supervised depth estimation can be trapped in a local minimum due to the gradient-locality issue of the photometric loss. In this paper, we present a framework to enhance depth by leveraging semantic segmentation to guide the network to jump out of the local minimum. Prior works have proposed to share encoders between these two tasks or explicitly align them based on priors like the consistency between edges in the depth and segmentation maps. Yet, these methods usually require ground truth or high-quality pseudo labels, which may not be easily accessible in real-world applications. In contrast, we investigate self-supervised depth estimation along with a segmentation branch that is supervised with noisy labels provided by models pre-trained with limited data. We extend parameter sharing from the encoder to the decoder and study the influence of different numbers of shared decoder parameters on model performance. Also, we propose to use cross-task information to refine current depth and segmentation predictions to generate pseudo-depth and semantic labels for training. The advantages of the proposed method are demonstrated through extensive experiments on the KITTI benchmark and a downstream task for endoscopic tissue deformation tracking.
△ Less
Submitted 31 March, 2023;
originally announced March 2023.
-
Biomedical image analysis competitions: The state of current participation practice
Authors:
Matthias Eisenmann,
Annika Reinke,
Vivienn Weru,
Minu Dietlinde Tizabi,
Fabian Isensee,
Tim J. Adler,
Patrick Godau,
Veronika Cheplygina,
Michal Kozubek,
Sharib Ali,
Anubha Gupta,
Jan Kybic,
Alison Noble,
Carlos Ortiz de Solórzano,
Samiksha Pachade,
Caroline Petitjean,
Daniel Sage,
Donglai Wei,
Elizabeth Wilden,
Deepak Alapatt,
Vincent Andrearczyk,
Ujjwal Baid,
Spyridon Bakas,
Niranjan Balu,
Sophia Bano
, et al. (331 additional authors not shown)
Abstract:
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,…
▽ More
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
△ Less
Submitted 12 September, 2023; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Transductive Kernels for Gaussian Processes on Graphs
Authors:
Yin-Cong Zhi,
Felix L. Opolka,
Yin Cheng Ng,
Pietro Liò,
Xiaowen Dong
Abstract:
Kernels on graphs have had limited options for node-level problems. To address this, we present a novel, generalized kernel for graphs with node feature data for semi-supervised learning. The kernel is derived from a regularization framework by treating the graph and feature data as two Hilbert spaces. We also show how numerous kernel-based models on graphs are instances of our design. A kernel de…
▽ More
Kernels on graphs have had limited options for node-level problems. To address this, we present a novel, generalized kernel for graphs with node feature data for semi-supervised learning. The kernel is derived from a regularization framework by treating the graph and feature data as two Hilbert spaces. We also show how numerous kernel-based models on graphs are instances of our design. A kernel defined this way has transductive properties, and this leads to improved ability to learn on fewer training points, as well as better handling of highly non-Euclidean data. We demonstrate these advantages using synthetic data where the distribution of the whole graph can inform the pattern of the labels. Finally, by utilizing a flexible polynomial of the graph Laplacian within the kernel, the model also performed effectively in semi-supervised classification on graphs of various levels of homophily.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting
Authors:
Jie Wang,
Yuji Liu,
Binling Wang,
Yiming Zhi,
Song Li,
Shipeng Xia,
Jiayang Zhang,
Feng Tong,
Lin Li,
Qingyang Hong
Abstract:
This paper describes a spatial-aware speaker diarization system for the multi-channel multi-party meeting. The diarization system obtains direction information of speaker by microphone array. Speaker spatial embedding is generated by xvector and s-vector derived from superdirective beamforming (SDB) which makes the embedding more robust. Specifically, we propose a novel multi-channel sequence-to-s…
▽ More
This paper describes a spatial-aware speaker diarization system for the multi-channel multi-party meeting. The diarization system obtains direction information of speaker by microphone array. Speaker spatial embedding is generated by xvector and s-vector derived from superdirective beamforming (SDB) which makes the embedding more robust. Specifically, we propose a novel multi-channel sequence-to-sequence neural network architecture named discriminative multi-stream neural network (DMSNet) which consists of attention superdirective beamforming (ASDB) block and Conformer encoder. The proposed ASDB is a self-adapted channel-wise block that extracts the latent spatial features of array audios by modeling interdependencies between channels. We explore DMSNet to address overlapped speech problem on multi-channel audio and achieve 93.53% accuracy on evaluation set. By performing DMSNet based overlapped speech detection (OSD) module, the diarization error rate (DER) of cluster-based diarization system decrease significantly from 13.45% to 7.64%.
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
Dual-Space NeRF: Learning Animatable Avatars and Scene Lighting in Separate Spaces
Authors:
Yihao Zhi,
Shenhan Qian,
Xinhao Yan,
Shenghua Gao
Abstract:
Modeling the human body in a canonical space is a common practice for capturing and animation. But when involving the neural radiance field (NeRF), learning a static NeRF in the canonical space is not enough because the lighting of the body changes when the person moves even though the scene lighting is constant. Previous methods alleviate the inconsistency of lighting by learning a per-frame embe…
▽ More
Modeling the human body in a canonical space is a common practice for capturing and animation. But when involving the neural radiance field (NeRF), learning a static NeRF in the canonical space is not enough because the lighting of the body changes when the person moves even though the scene lighting is constant. Previous methods alleviate the inconsistency of lighting by learning a per-frame embedding, but this operation does not generalize to unseen poses. Given that the lighting condition is static in the world space while the human body is consistent in the canonical space, we propose a dual-space NeRF that models the scene lighting and the human body with two MLPs in two separate spaces. To bridge these two spaces, previous methods mostly rely on the linear blend skinning (LBS) algorithm. However, the blending weights for LBS of a dynamic neural field are intractable and thus are usually memorized with another MLP, which does not generalize to novel poses. Although it is possible to borrow the blending weights of a parametric mesh such as SMPL, the interpolation operation introduces more artifacts. In this paper, we propose to use the barycentric map**, which can directly generalize to unseen poses and surprisingly achieves superior results than LBS with neural blending weights. Quantitative and qualitative results on the Human3.6M and the ZJU-MoCap datasets show the effectiveness of our method.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.
-
Manipulating Random Lasing Correlations in Doped Liquid Crystals
Authors:
Yiyang Zhi,
Andrew Lininger,
Giuseppe Strangi
Abstract:
Random lasers are highly configurable light sources that are promising for imaging and photonic integration. In this study, random lasing action was generated by optically pum** MBBA liquid crystals infiltrated with gold nanoparticles and laser dye (pyrromethene 597). By varying the pump energy near lasing threshold, we show that it is possible to control the intensity correlations between the r…
▽ More
Random lasers are highly configurable light sources that are promising for imaging and photonic integration. In this study, random lasing action was generated by optically pum** MBBA liquid crystals infiltrated with gold nanoparticles and laser dye (pyrromethene 597). By varying the pump energy near lasing threshold, we show that it is possible to control the intensity correlations between the random lasing modes. The correlations in the system were phenomenologically characterized using the Lévy statistics of the emission spectra survival function. We also find that correlations and persistence of lasing action are correlated. These results demonstrate the possibility to dynamically control a key physical feature of random lasers, which may find applications in biomedical settings and network communications.
△ Less
Submitted 30 April, 2023; v1 submitted 25 August, 2022;
originally announced August 2022.
-
Semiconductor ring laser frequency combs with active directional couplers
Authors:
Dmitry Kazakov,
Theodore P. Letsou,
Maximilian Beiser,
Yiyang Zhi,
Nikola Opačak,
Marco Piccardo,
Benedikt Schwarz,
Federico Capasso
Abstract:
Rapid development of Fabry-Perot quantum cascade laser frequency combs has converted them from laboratory devices to key components of next-generation fast molecular spectrometers. Recently, free-running ring quantum cascade lasers allowed generation of new frequency comb states induced by phase turbulence. In absence of efficient light outcoupling, ring quantum cascade lasers are not suited for a…
▽ More
Rapid development of Fabry-Perot quantum cascade laser frequency combs has converted them from laboratory devices to key components of next-generation fast molecular spectrometers. Recently, free-running ring quantum cascade lasers allowed generation of new frequency comb states induced by phase turbulence. In absence of efficient light outcoupling, ring quantum cascade lasers are not suited for applications as they are limited in their power output to microwatt levels. Here we demonstrate electrically pumped ring quantum cascade lasers with integrated active directional couplers. These devices generate self-starting frequency combs and have output power above ten milliwatts at room temperature. We study the transmission of the ring-waveguide resonator system below the lasing threshold, which reveals the ability to individually control the mode indices in the coupled resonators, their quality factors, and the coupling coefficient. When the ring resonator is pumped above the lasing threshold, the intracavity unidirectional single-mode field parametrically amplifies an externally injected signal tuned into one of the ring resonances, generating an idler sideband via four-wave mixing. The ability to inject external optical signals into integrated laser cavities brings into reach coherent control of frequency comb states in ring semiconductor lasers. Furthermore, tunable coupled active resonators pumped below the lasing threshold enable a versatile platform for the studies of resonant electromagnetic effects, ranging from strong coupling to parity-time symmetry breaking.
△ Less
Submitted 8 June, 2022; v1 submitted 7 June, 2022;
originally announced June 2022.
-
The xmuspeech system for multi-channel multi-party meeting transcription challenge
Authors:
Jie Wang,
Yuji Liu,
Binling Wang,
Yiming Zhi,
Song Li1,
Shipeng Xia,
Jiayang Zhang,
Lin Li1,
Qingyang Hong,
Feng Tong
Abstract:
This paper describes the system developed by the XMUSPEECH team for the Multi-channel Multi-party Meeting Transcription Challenge (M2MeT). For the speaker diarization task, we propose a multi-channel speaker diarization system that obtains spatial information of speaker by Difference of Arrival (DOA) technology. Speaker-spatial embedding is generated by x-vector and s-vector derived from Filter-an…
▽ More
This paper describes the system developed by the XMUSPEECH team for the Multi-channel Multi-party Meeting Transcription Challenge (M2MeT). For the speaker diarization task, we propose a multi-channel speaker diarization system that obtains spatial information of speaker by Difference of Arrival (DOA) technology. Speaker-spatial embedding is generated by x-vector and s-vector derived from Filter-and-Sum Beamforming (FSB) which makes the embedding more robust. Specifically, we propose a novel multi-channel sequence-to-sequence neural network architecture named Discriminative Multi-stream Neural Network (DMSNet) which consists of Attention Filter-and-Sum block (AFSB) and Conformer encoder. We explore DMSNet to address overlapped speech problem on multi-channel audio. Compared with LSTM based OSD module, we achieve a decreases of 10.1% in Detection Error Rate(DetER). By performing DMSNet based OSD module, the DER of cluster-based diarization system decrease significantly form 13.44% to 7.63%. Our best fusion system achieves 7.09% and 9.80% of the diarization error rate (DER) on evaluation set and test set.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
Configuration Space Decomposition for Scalable Proxy Collision Checking in Robot Planning and Control
Authors:
Mrinal Verghese,
Nikhil Das,
Yuheng Zhi,
Michael Yip
Abstract:
Real-time robot motion planning in complex high-dimensional environments remains an open problem. Motion planning algorithms, and their underlying collision checkers, are crucial to any robot control stack. Collision checking takes up a large portion of the computational time in robot motion planning. Existing collision checkers make trade-offs between speed and accuracy and scale poorly to high-d…
▽ More
Real-time robot motion planning in complex high-dimensional environments remains an open problem. Motion planning algorithms, and their underlying collision checkers, are crucial to any robot control stack. Collision checking takes up a large portion of the computational time in robot motion planning. Existing collision checkers make trade-offs between speed and accuracy and scale poorly to high-dimensional, complex environments. We present a novel space decomposition method using K-Means clustering in the Forward Kinematics space to accelerate proxy collision checking. We train individual configuration space models using Fastron, a kernel perceptron algorithm, on these decomposed subspaces, yielding compact yet highly accurate models that can be queried rapidly and scale better to more complex environments. We demonstrate this new method, called Decomposed Fast Perceptron (D-Fastron), on the 7-DOF Baxter robot producing on average 29x faster collision checks and up to 9.8x faster motion planning compared to state-of-the-art geometric collision checkers.
△ Less
Submitted 26 January, 2022; v1 submitted 12 January, 2022;
originally announced January 2022.
-
Affine geometry and Frobenius algebra
Authors:
Kefeng Liu,
Hao Xu,
Yanhui Zhi
Abstract:
The associativity of the multiplication on a Frobenius manifold is equivalent to the WDVV equation of a symmetric cubic form in flat coordinates. Frobenius manifold could be regarded a very special type of statistical manifold. There is a natural commutative product on each tangent space of a statistical manifold. We show that it is associative, hence making it into a manifold with Frobenius algeb…
▽ More
The associativity of the multiplication on a Frobenius manifold is equivalent to the WDVV equation of a symmetric cubic form in flat coordinates. Frobenius manifold could be regarded a very special type of statistical manifold. There is a natural commutative product on each tangent space of a statistical manifold. We show that it is associative, hence making it into a manifold with Frobenius algebra structure, if and only if the sectional $K$-curvature vanishes. In other words, WDVV equation is equivalent to zero sectional $K$-curvature. This gives a curvature interpretation for WDVV equation.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
Adaptive Gaussian Processes on Graphs via Spectral Graph Wavelets
Authors:
Felix L. Opolka,
Yin-Cong Zhi,
Pietro Liò,
Xiaowen Dong
Abstract:
Graph-based models require aggregating information in the graph from neighbourhoods of different sizes. In particular, when the data exhibit varying levels of smoothness on the graph, a multi-scale approach is required to capture the relevant information. In this work, we propose a Gaussian process model using spectral graph wavelets, which can naturally aggregate neighbourhood information at diff…
▽ More
Graph-based models require aggregating information in the graph from neighbourhoods of different sizes. In particular, when the data exhibit varying levels of smoothness on the graph, a multi-scale approach is required to capture the relevant information. In this work, we propose a Gaussian process model using spectral graph wavelets, which can naturally aggregate neighbourhood information at different scales. Through maximum likelihood optimisation of the model hyperparameters, the wavelets automatically adapt to the different frequencies in the data, and as a result our model goes beyond capturing low frequency information. We achieve scalability to larger graphs by using a spectrum-adaptive polynomial approximation of the filter function, which is designed to yield a low approximation error in dense areas of the graph spectrum. Synthetic and real-world experiments demonstrate the ability of our model to infer scales accurately and produce competitive performances against state-of-the-art models in graph-based learning tasks.
△ Less
Submitted 20 February, 2022; v1 submitted 25 October, 2021;
originally announced October 2021.
-
Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates
Authors:
Shenhan Qian,
Zhi Tu,
Yihao Zhi,
Wen Liu,
Shenghua Gao
Abstract:
Co-speech gesture generation is to synthesize a gesture sequence that not only looks real but also matches with the input speech audio. Our method generates the movements of a complete upper body, including arms, hands, and the head. Although recent data-driven methods achieve great success, challenges still exist, such as limited variety, poor fidelity, and lack of objective metrics. Motivated by…
▽ More
Co-speech gesture generation is to synthesize a gesture sequence that not only looks real but also matches with the input speech audio. Our method generates the movements of a complete upper body, including arms, hands, and the head. Although recent data-driven methods achieve great success, challenges still exist, such as limited variety, poor fidelity, and lack of objective metrics. Motivated by the fact that the speech cannot fully determine the gesture, we design a method that learns a set of gesture template vectors to model the latent conditions, which relieve the ambiguity. For our method, the template vector determines the general appearance of a generated gesture sequence, while the speech audio drives subtle movements of the body, both indispensable for synthesizing a realistic gesture sequence. Due to the intractability of an objective metric for gesture-speech synchronization, we adopt the lip-sync error as a proxy metric to tune and evaluate the synchronization ability of our model. Extensive experiments show the superiority of our method in both objective and subjective evaluations on fidelity and synchronization.
△ Less
Submitted 29 November, 2021; v1 submitted 18 August, 2021;
originally announced August 2021.
-
OLR 2021 Challenge: Datasets, Rules and Baselines
Authors:
Binling Wang,
Wenxuan Hu,
**g Li,
Yiming Zhi,
Zheng Li,
Qingyang Hong,
Lin Li,
Dong Wang,
Liming Song,
Cheng Yang
Abstract:
This paper introduces the sixth Oriental Language Recognition (OLR) 2021 Challenge, which intends to improve the performance of language recognition systems and speech recognition systems within multilingual scenarios. The data profile, four tasks, two baselines, and the evaluation principles are introduced in this paper. In addition to the Language Identification (LID) tasks, multilingual Automat…
▽ More
This paper introduces the sixth Oriental Language Recognition (OLR) 2021 Challenge, which intends to improve the performance of language recognition systems and speech recognition systems within multilingual scenarios. The data profile, four tasks, two baselines, and the evaluation principles are introduced in this paper. In addition to the Language Identification (LID) tasks, multilingual Automatic Speech Recognition (ASR) tasks are introduced to OLR 2021 Challenge for the first time. The challenge this year focuses on more practical and challenging problems, with four tasks: (1) constrained LID, (2) unconstrained LID, (3) constrained multilingual ASR, (4) unconstrained multilingual ASR. Baselines for LID tasks and multilingual ASR tasks are provided, respectively. The LID baseline system is an extended TDNN x-vector model constructed with Pytorch. A transformer-based end-to-end model is provided as the multilingual ASR baseline system. These recipes will be online published, and available for participants to construct their own LID or ASR systems. The baseline results demonstrate that those tasks are rather challenging and deserve more effort to achieve better performance.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.
-
Oriental Language Recognition (OLR) 2020: Summary and Analysis
Authors:
**g Li,
Binling Wang,
Yiming Zhi,
Zheng Li,
Lin Li,
Qingyang Hong,
Dong Wang
Abstract:
The fifth Oriental Language Recognition (OLR) Challenge focuses on language recognition in a variety of complex environments to promote its development. The OLR 2020 Challenge includes three tasks: (1) cross-channel language identification, (2) dialect identification, and (3) noisy language identification. We choose Cavg as the principle evaluation metric, and the Equal Error Rate (EER) as the sec…
▽ More
The fifth Oriental Language Recognition (OLR) Challenge focuses on language recognition in a variety of complex environments to promote its development. The OLR 2020 Challenge includes three tasks: (1) cross-channel language identification, (2) dialect identification, and (3) noisy language identification. We choose Cavg as the principle evaluation metric, and the Equal Error Rate (EER) as the secondary metric. There were 58 teams participating in this challenge and one third of the teams submitted valid results. Compared with the best baseline, the Cavg values of Top 1 system for the three tasks were relatively reduced by 82%, 62% and 48%, respectively. This paper describes the three tasks, the database profile, and the final results. We also outline the novel approaches that improve the performance of language recognition systems most significantly, such as the utilization of auxiliary information.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
An Integrated Framework for Two-pass Personalized Voice Trigger
Authors:
Dexin Liao,
**g Li,
Yiming Zhi,
Song Li,
Qingyang Hong,
Lin Li
Abstract:
In this paper, we present the XMUSPEECH system for Task 1 of 2020 Personalized Voice Trigger Challenge (PVTC2020). Task 1 is a joint wake-up word detection with speaker verification on close talking data. The whole system consists of a keyword spotting (KWS) sub-system and a speaker verification (SV) sub-system. For the KWS system, we applied a Temporal Depthwise Separable Convolution Residual Net…
▽ More
In this paper, we present the XMUSPEECH system for Task 1 of 2020 Personalized Voice Trigger Challenge (PVTC2020). Task 1 is a joint wake-up word detection with speaker verification on close talking data. The whole system consists of a keyword spotting (KWS) sub-system and a speaker verification (SV) sub-system. For the KWS system, we applied a Temporal Depthwise Separable Convolution Residual Network (TDSC-ResNet) to improve the system's performance. For the SV system, we proposed a multi-task learning network, where phonetic branch is trained with the character label of the utterance, and speaker branch is trained with the label of the speaker. Phonetic branch is optimized with connectionist temporal classification (CTC) loss, which is treated as an auxiliary module for speaker branch. Experiments show that our system gets significant improvements compared with baseline system.
△ Less
Submitted 30 June, 2021;
originally announced June 2021.
-
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
Authors:
Yuan Zhi,
Zhan Tong,
Limin Wang,
Gangshan Wu
Abstract:
Frame sampling is a fundamental problem in video action recognition due to the essential redundancy in time and limited computation resources. The existing sampling strategy often employs a fixed frame selection and lacks the flexibility to deal with complex variations in videos. In this paper, we present a simple, sparse, and explainable frame sampler, termed as Motion-Guided Sampler (MGSampler).…
▽ More
Frame sampling is a fundamental problem in video action recognition due to the essential redundancy in time and limited computation resources. The existing sampling strategy often employs a fixed frame selection and lacks the flexibility to deal with complex variations in videos. In this paper, we present a simple, sparse, and explainable frame sampler, termed as Motion-Guided Sampler (MGSampler). Our basic motivation is that motion is an important and universal signal that can drive us to adaptively select frames from videos. Accordingly, we propose two important properties in our MGSampler design: motion sensitive and motion uniform. First, we present two different motion representations to enable us to efficiently distinguish the motion-salient frames from the background. Then, we devise a motion-uniform sampling strategy based on the cumulative motion distribution to ensure the sampled frames evenly cover all the important segments with high motion salience. Our MGSampler yields a new principled and holistic sampling scheme, that could be incorporated into any existing video architecture. Experiments on five benchmarks demonstrate the effectiveness of our MGSampler over the previous fixed sampling strategies, and its generalization power across different backbones, video models, and datasets.
△ Less
Submitted 20 August, 2021; v1 submitted 20 April, 2021;
originally announced April 2021.
-
Data-driven Actuator Selection for Artificial Muscle-Powered Robots
Authors:
Taylor West Henderson,
Yuheng Zhi,
Angela Liu,
Michael C. Yip
Abstract:
Even though artificial muscles have gained popularity due to their compliant, flexible, and compact properties, there currently does not exist an easy way of making informed decisions on the appropriate actuation strategy when designing a muscle-powered robot; thus limiting the transition of such technologies into broader applications. What's more, when a new muscle actuation technology is develop…
▽ More
Even though artificial muscles have gained popularity due to their compliant, flexible, and compact properties, there currently does not exist an easy way of making informed decisions on the appropriate actuation strategy when designing a muscle-powered robot; thus limiting the transition of such technologies into broader applications. What's more, when a new muscle actuation technology is developed, it is difficult to compare it against existing robot muscles. To accelerate the development of artificial muscle applications, we propose a data driven approach for robot muscle actuator selection using Support Vector Machines (SVM). This first-of-its-kind method gives users gives users insight into which actuators fit their specific needs and actuation performance criteria, making it possible for researchers and engineer with little to no prior knowledge of artificial muscles to focus on application design. It also provides a platform to benchmark existing, new, or yet-to-be-discovered artificial muscle technologies. We test our method on unseen existing robot muscle designs to prove its usability on real-world applications. We provide an open-access, web-searchable interface for easy access to our models that will additionally allow for continuous contribution of new actuator data from groups around the world to enhance and expand these models.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
DiffCo: Auto-Differentiable Proxy Collision Detection with Multi-class Labels for Safety-Aware Trajectory Optimization
Authors:
Yuheng Zhi,
Nikhil Das,
Michael Yip
Abstract:
The objective of trajectory optimization algorithms is to achieve an optimal collision-free path between a start and goal state. In real-world scenarios where environments can be complex and non-homogeneous, a robot needs to be able to gauge whether a state will be in collision with various objects in order to meet some safety metrics. The collision detector should be computationally efficient and…
▽ More
The objective of trajectory optimization algorithms is to achieve an optimal collision-free path between a start and goal state. In real-world scenarios where environments can be complex and non-homogeneous, a robot needs to be able to gauge whether a state will be in collision with various objects in order to meet some safety metrics. The collision detector should be computationally efficient and, ideally, analytically differentiable to facilitate stable and rapid gradient descent during optimization. However, methods today lack an elegant approach to detect collision differentiably, relying rather on numerical gradients that can be unstable. We present DiffCo, the first, fully auto-differentiable, non-parametric model for collision detection. Its non-parametric behavior allows one to compute collision boundaries on-the-fly and update them, requiring no pre-training and allowing it to update continuously in dynamic environments. It provides robust gradients for trajectory optimization via backpropagation and is often 10-100x faster to compute than its geometric counterparts. DiffCo also extends trivially to modeling different object collision classes for semantically informed trajectory optimization.
△ Less
Submitted 18 February, 2022; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Conceptual design of the Spin Physics Detector
Authors:
V. M. Abazov,
V. Abramov,
L. G. Afanasyev,
R. R. Akhunzyanov,
A. V. Akindinov,
N. Akopov,
I. G. Alekseev,
A. M. Aleshko,
V. Yu. Alexakhin,
G. D. Alexeev,
M. Alexeev,
A. Amoroso,
I. V. Anikin,
V. F. Andreev,
V. A. Anosov,
A. B. Arbuzov,
N. I. Azorskiy,
A. A. Baldin,
V. V. Balandina,
E. G. Baldina,
M. Yu. Barabanov,
S. G. Barsov,
V. A. Baskov,
A. N. Beloborodov,
I. N. Belov
, et al. (270 additional authors not shown)
Abstract:
The Spin Physics Detector, a universal facility for studying the nucleon spin structure and other spin-related phenomena with polarized proton and deuteron beams, is proposed to be placed in one of the two interaction points of the NICA collider that is under construction at the Joint Institute for Nuclear Research (Dubna, Russia). At the heart of the project there is huge experience with polarize…
▽ More
The Spin Physics Detector, a universal facility for studying the nucleon spin structure and other spin-related phenomena with polarized proton and deuteron beams, is proposed to be placed in one of the two interaction points of the NICA collider that is under construction at the Joint Institute for Nuclear Research (Dubna, Russia). At the heart of the project there is huge experience with polarized beams at JINR.
The main objective of the proposed experiment is the comprehensive study of the unpolarized and polarized gluon content of the nucleon. Spin measurements at the Spin Physics Detector at the NICA collider have bright perspectives to make a unique contribution and challenge our understanding of the spin structure of the nucleon. In this document the Conceptual Design of the Spin Physics Detector is presented.
△ Less
Submitted 2 February, 2022; v1 submitted 31 January, 2021;
originally announced February 2021.
-
A parallel-in-time two-sided preconditioning for all-at-once system from a non-local evolutionary equation with weakly singular kernel
Authors:
Xue-lei Lin,
Michael K. Ng,
Ya**g Zhi
Abstract:
In this paper, we study a parallel-in-time (PinT) algorithm for all-at-once system from a non-local evolutionary equation with weakly singular kernel where the temporal term involves a non-local convolution with a weakly singular kernel and the spatial term is the usual Laplacian operator with variable coefficients. We propose to use a two-sided preconditioning technique for the all-at-once discre…
▽ More
In this paper, we study a parallel-in-time (PinT) algorithm for all-at-once system from a non-local evolutionary equation with weakly singular kernel where the temporal term involves a non-local convolution with a weakly singular kernel and the spatial term is the usual Laplacian operator with variable coefficients. We propose to use a two-sided preconditioning technique for the all-at-once discretization of the equation. Our preconditioner is constructed by replacing the variable diffusion coefficients with a constant coefficient to obtain a constant-coefficient all-at-once matrix. We split a square root of the constant Laplacian operator out of the constant-coefficient all-at-once matrix as a right preconditioner and take the remaining part as a left preconditioner, which constitutes our two-sided preconditioning. Exploiting the diagonalizability of the constant-Laplacian matrix and the triangular Toeplitz structure of the temporal discretization matrix, we obtain efficient representations of inverses of the right and the left preconditioners, because of which the iterative solution can be fast updated in a PinT manner. Theoretically, the condition number of the two-sided preconditioned matrix is proven to be uniformly bounded by a constant independent of the matrix size. To the best of our knowledge, for the non-local evolutionary equation with variable coefficients, this is the first attempt to develop a PinT preconditioning technique that has fast and exact implementation and that the corresponding preconditioned system has a uniformly bounded condition number. Numerical results are reported to confirm the efficiency of the proposed two-sided preconditioning technique.
△ Less
Submitted 30 January, 2021;
originally announced February 2021.
-
Spatially Correlated Patterns in Adversarial Images
Authors:
Nandish Chattopadhyay,
Lionell Yip En Zhi,
Bryan Tan Bing Xing,
Anupam Chattopadhyay
Abstract:
Adversarial attacks have proved to be the major impediment in the progress on research towards reliable machine learning solutions. Carefully crafted perturbations, imperceptible to human vision, can be added to images to force misclassification by an otherwise high performing neural network. To have a better understanding of the key contributors of such structured attacks, we searched for and stu…
▽ More
Adversarial attacks have proved to be the major impediment in the progress on research towards reliable machine learning solutions. Carefully crafted perturbations, imperceptible to human vision, can be added to images to force misclassification by an otherwise high performing neural network. To have a better understanding of the key contributors of such structured attacks, we searched for and studied spatially co-located patterns in the distribution of pixels in the input space. In this paper, we propose a framework for segregating and isolating regions within an input image which are particularly critical towards either classification (during inference), or adversarial vulnerability or both. We assert that during inference, the trained model looks at a specific region in the image, which we call Region of Importance (RoI); and the attacker looks at a region to alter/modify, which we call Region of Attack (RoA). The success of this approach could also be used to design a post-hoc adversarial defence method, as illustrated by our observations. This uses the notion of blocking out (we call neutralizing) that region of the image which is highly vulnerable to adversarial attacks but is not important for the task of classification. We establish the theoretical setup for formalising the process of segregation, isolation and neutralization and substantiate it through empirical analysis on standard benchmarking datasets. The findings strongly indicate that map** features into the input space preserves the significant patterns typically observed in the feature-space while adding major interpretability and therefore simplifies potential defensive mechanisms.
△ Less
Submitted 21 November, 2020;
originally announced November 2020.
-
Neurodegenerative damage reduces firing coherence in a continuous attractor model of grid cells
Authors:
Yuduo Zhi,
Daniel L. Cox
Abstract:
Grid cells in the dorsolateral band of the medial entorhinal cortex(dMEC) display strikingly regular periodic firing patterns on a lattice of positions in 2-D space. This helps animals to encode relative spatial location without reference to external cues. The dMEC is damaged in the early stages of Alzheimer's Disease, which affects navigation ability of a disease victim, reducing the synaptic den…
▽ More
Grid cells in the dorsolateral band of the medial entorhinal cortex(dMEC) display strikingly regular periodic firing patterns on a lattice of positions in 2-D space. This helps animals to encode relative spatial location without reference to external cues. The dMEC is damaged in the early stages of Alzheimer's Disease, which affects navigation ability of a disease victim, reducing the synaptic density of neurons in the network. Within an established 2-dimensional continuous attractor neural network model of grid cell activity, we introduce damage parameterized by radius and by the strength of the synaptic output for neurons in the damaged region. The proportionality of the grid field flow on the dMEX to the velocity of the model organism is maintained, but when we examine the coherence of the grid cell firing field in the form of the Fourier transform (Bragg peaks) of the grid lattice, we find that a wide range of damage radius and strength induces an incoherent structure with only a single central peak, adjacent to narrow bands of striped (two additional peaks), which abut an orthorhombic pattern (four additional peaks), that abuts the undamaged hexagonal region (six additional peaks). Within the damaged region, grid cells show no Bragg peaks, and outside the damaged region the central Bragg peak strength is largely unaffected. There is a re-entrant region of normal grid firing for very large damage area. We anticipate that the modified grid cell behavior can be observed in non-invasive fMRI imaging of the dMEC.
△ Less
Submitted 13 April, 2021; v1 submitted 12 August, 2020;
originally announced August 2020.
-
Gaussian Processes on Graphs via Spectral Kernel Learning
Authors:
Yin-Cong Zhi,
Yin Cheng Ng,
Xiaowen Dong
Abstract:
We propose a graph spectrum-based Gaussian process for prediction of signals defined on nodes of the graph. The model is designed to capture various graph signal structures through a highly adaptive kernel that incorporates a flexible polynomial function in the graph spectral domain. Unlike most existing approaches, we propose to learn such a spectral kernel, where the polynomial setup enables lea…
▽ More
We propose a graph spectrum-based Gaussian process for prediction of signals defined on nodes of the graph. The model is designed to capture various graph signal structures through a highly adaptive kernel that incorporates a flexible polynomial function in the graph spectral domain. Unlike most existing approaches, we propose to learn such a spectral kernel, where the polynomial setup enables learning without the need for eigen-decomposition of the graph Laplacian. In addition, this kernel has the interpretability of graph filtering achieved by a bespoke maximum likelihood learning algorithm that enforces the positivity of the spectrum. We demonstrate the interpretability of the model in synthetic experiments from which we show the various ground truth spectral filters can be accurately recovered, and the adaptability translates to superior performances in the prediction of real-world graph data of various characteristics.
△ Less
Submitted 28 October, 2020; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Flow Based Self-supervised Pixel Embedding for Image Segmentation
Authors:
Bin Ma,
Shubao Liu,
Yingxuan Zhi,
Qi Song
Abstract:
We propose a new self-supervised approach to image feature learning from motion cue. This new approach leverages recent advances in deep learning in two directions: 1) the success of training deep neural network in estimating optical flow in real data using synthetic flow data; and 2) emerging work in learning image features from motion cues, such as optical flow. Building on these, we demonstrate…
▽ More
We propose a new self-supervised approach to image feature learning from motion cue. This new approach leverages recent advances in deep learning in two directions: 1) the success of training deep neural network in estimating optical flow in real data using synthetic flow data; and 2) emerging work in learning image features from motion cues, such as optical flow. Building on these, we demonstrate that image features can be learned in self-supervision by first training an optical flow estimator with synthetic flow data, and then learning image features from the estimated flows in real motion data. We demonstrate and evaluate this approach on an image segmentation task. Using the learned image feature representation, the network performs significantly better than the ones trained from scratch in few-shot segmentation tasks.
△ Less
Submitted 8 January, 2019; v1 submitted 2 January, 2019;
originally announced January 2019.
-
Augmented Reality Predictive Displays to Help Mitigate the Effects of Delayed Telesurgery
Authors:
Florian Richter,
Yifei Zhang,
Yuheng Zhi,
Ryan K. Orosco,
Michael C. Yip
Abstract:
Surgical robots offer the exciting potential for remote telesurgery, but advances are needed to make this technology efficient and accurate to ensure patient safety. Achieving these goals is hindered by the deleterious effects of latency between the remote operator and the bedside robot. Predictive displays have found success in overcoming these effects by giving the operator immediate visual feed…
▽ More
Surgical robots offer the exciting potential for remote telesurgery, but advances are needed to make this technology efficient and accurate to ensure patient safety. Achieving these goals is hindered by the deleterious effects of latency between the remote operator and the bedside robot. Predictive displays have found success in overcoming these effects by giving the operator immediate visual feedback. However, previously developed predictive displays can not be directly applied to telesurgery due to the unique challenges in tracking the 3D geometry of the surgical environment. In this paper, we present the first predictive display for teleoperated surgical robots. The predicted display is stereoscopic, utilizes Augmented Reality (AR) to show the predicted motions alongside the complex tissue found in-situ within surgical environments, and overcomes the challenges in accurately tracking slave-tools in real-time. We call this a Stereoscopic AR Predictive Display (SARPD). To test the SARPD's performance, we conducted a user study with ten participants on the da Vinci\textregistered{} Surgical System. The results showed with statistical significance that using SARPD decreased time to complete task while having no effect on error rates when operating under delay.
△ Less
Submitted 20 February, 2019; v1 submitted 23 September, 2018;
originally announced September 2018.
-
Wide field-of-view and high-efficiency light concentrator
Authors:
Yu Zhi,
Ye Liang,
Zhe Wang,
Shaomin Chen
Abstract:
To improve light yield and energy resolution in large-volume neutrino detectors, light concentrators are often mounted on photomultiplier tubes to increase the detection efficiency of optical photons from scintillation or Cherenkov light induced by charged particles. We propose a method to optimize previous light concentrators design in order to attain a field of view of 90 degrees and a geometric…
▽ More
To improve light yield and energy resolution in large-volume neutrino detectors, light concentrators are often mounted on photomultiplier tubes to increase the detection efficiency of optical photons from scintillation or Cherenkov light induced by charged particles. We propose a method to optimize previous light concentrators design in order to attain a field of view of 90 degrees and a geometrical collection efficiency above 98%. This improvement could be crucial to **** and other future neutrino experiments whichever it is applicable.
△ Less
Submitted 28 December, 2017; v1 submitted 22 March, 2017;
originally announced March 2017.
-
Detection of Single Nanoparticles Using the Dissipative Interaction in a High-Q Microcavity
Authors:
Bo-Qiang Shen,
Xiao-Chong Yu,
Yanyan Zhi,
Li Wang,
Donghyun Kim,
Qihuang Gong,
Yun-Feng Xiao
Abstract:
Ultrasensitive optical detection of nanometer-scaled particles is highly desirable for applications in early-stage diagnosis of human diseases, environmental monitoring, and homeland security, but remains extremely difficult due to ultralow polarizabilities of small-sized, low-index particles. Optical whispering-gallery-mode microcavities, which can enhance significantly the light-matter interacti…
▽ More
Ultrasensitive optical detection of nanometer-scaled particles is highly desirable for applications in early-stage diagnosis of human diseases, environmental monitoring, and homeland security, but remains extremely difficult due to ultralow polarizabilities of small-sized, low-index particles. Optical whispering-gallery-mode microcavities, which can enhance significantly the light-matter interaction, have emerged as promising platforms for label-free detection of nanoscale objects. Different from the conventional whispering-gallery-mode sensing relying on the reactive (i.e., dispersive) interaction, here we propose and demonstrate to detect single lossy nanoparticles using the dissipative interaction in a high-$Q$ toroidal microcavity. In the experiment, detection of single gold nanorods in an aqueous environment is realized by monitoring simultaneously the linewidth change and shift of the cavity mode. The experimental result falls within the theoretical prediction. Remarkably, the reactive and dissipative sensing methods are evaluated by setting the probe wavelength on and off the surface plasmon resonance to tune the absorption of nanorods, which demonstrates clearly the great potential of the dissipative sensing method to detect lossy nanoparticles. Future applications could also combine the dissipative and reactive sensing methods, which may provide better characterizations of nanoparticles.
△ Less
Submitted 8 April, 2016;
originally announced April 2016.
-
Urban spatial-temporal activity structures: a New Approach to Inferring the Intra-urban Functional Regions via Social Media Check-In Data
Authors:
Ye Zhi,
Yu Liu,
Shaowen Wang,
Min Deng,
**g Gao,
Haifeng Li
Abstract:
Most existing literature focuses on the exterior temporal rhythm of human movement to infer the functional regions in a city, but they neglects the underlying interdependence between the functional regions and human activities which uncovers more detailed characteristics of regions. In this research, we proposed a novel model based on the low rank approximation (LRA) to detect the functional regio…
▽ More
Most existing literature focuses on the exterior temporal rhythm of human movement to infer the functional regions in a city, but they neglects the underlying interdependence between the functional regions and human activities which uncovers more detailed characteristics of regions. In this research, we proposed a novel model based on the low rank approximation (LRA) to detect the functional regions using the data from about 15 million check-in records during a yearlong period in Shanghai, China. We find a series of latent structures, called urban spatial-temporal activity structure (USTAS). While interpreting these structures, a series of outstanding underlying associations between the spatial and temporal activity patterns can be found. Moreover, we can not only reproduce the observed data with a lower dimensional representative but also simultaneously project both the spatial and temporal activity patterns in the same coordinate system. By utilizing the K-means clustering algorithm, five significant types of clusters which are directly annotated with a corresponding combination of temporal activities can be obtained. This provides a clear picture of how the groups of regions are associated with different activities at different time of day. Besides the commercial and transportation dominant area, we also detect two kinds of residential areas, the developed residential areas and the develo** residential areas. We further verify the spatial distribution of these clusters in the view of urban form analysis. The results shows a high consistency with the government planning from the same periods, indicating our model is applicable for inferring the functional regions via social media check-in data, and can benefit a wide range of fields, such as urban planning, public services and location-based recommender systems and other purposes.
△ Less
Submitted 20 January, 2015; v1 submitted 23 December, 2014;
originally announced December 2014.
-
Device-independent bounds for Hardy's experiment
Authors:
Rafael Rabelo,
Law Yun Zhi,
Valerio Scarani
Abstract:
In this Letter we compute an analogue of Tsirelson's bound for Hardy's test of nonlocality, that is, the maximum violation of locality constraints allowed by the quantum formalism, irrespective of the dimension of the system. The value is found to be the same as the one achievable already with two-qubit systems, and we show that only a very specific class of states can lead to such maximal value,…
▽ More
In this Letter we compute an analogue of Tsirelson's bound for Hardy's test of nonlocality, that is, the maximum violation of locality constraints allowed by the quantum formalism, irrespective of the dimension of the system. The value is found to be the same as the one achievable already with two-qubit systems, and we show that only a very specific class of states can lead to such maximal value, thus highlighting Hardy's test as a device-independent self-test protocol for such states. By considering realistic constraints in Hardy's test, we also compute device-independent upper bounds on this violation and show that these bounds are saturated by two-qubit systems, thus showing that there is no advantage in using higher-dimensional systems in experimental implementations of such test.
△ Less
Submitted 17 July, 2012; v1 submitted 15 May, 2012;
originally announced May 2012.