-
MDeRainNet: An Efficient Neural Network for Rain Streak Removal from Macro-pixel Images
Authors:
Tao Yan,
Weijiang He,
Chenglong Wang,
Xiangjie Zhu,
Yinghui Wang,
Rynson W. H. Lau
Abstract:
Since rainy weather always degrades image quality and poses significant challenges to most computer vision-based intelligent systems, image de-raining has been a hot research topic. Fortunately, in a rainy light field (LF) image, background obscured by rain streaks in one sub-view may be visible in the other sub-views, and implicit depth information and recorded 4D structural information may benef…
▽ More
Since rainy weather always degrades image quality and poses significant challenges to most computer vision-based intelligent systems, image de-raining has been a hot research topic. Fortunately, in a rainy light field (LF) image, background obscured by rain streaks in one sub-view may be visible in the other sub-views, and implicit depth information and recorded 4D structural information may benefit rain streak detection and removal. However, existing LF image rain removal methods either do not fully exploit the global correlations of 4D LF data or only utilize partial sub-views, resulting in sub-optimal rain removal performance and no-equally good quality for all de-rained sub-views. In this paper, we propose an efficient network, called MDeRainNet, for rain streak removal from LF images. The proposed network adopts a multi-scale encoder-decoder architecture, which directly works on Macro-pixel images (MPIs) to improve the rain removal performance. To fully model the global correlation between the spatial and the angular information, we propose an Extended Spatial-Angular Interaction (ESAI) module to merge them, in which a simple and effective Transformer-based Spatial-Angular Interaction Attention (SAIA) block is also proposed for modeling long-range geometric correlations and making full use of the angular information. Furthermore, to improve the generalization performance of our network on real-world rainy scenes, we propose a novel semi-supervised learning framework for our MDeRainNet, which utilizes multi-level KL loss to bridge the domain gap between features of synthetic and real-world rain streaks and introduces colored-residue image guided contrastive regularization to reconstruct rain-free images. Extensive experiments conducted on synthetic and real-world LFIs demonstrate that our method outperforms the state-of-the-art methods both quantitatively and qualitatively.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors
Authors:
Tianyu Huang,
Yihan Zeng,
Hui Li,
Wangmeng Zuo,
Rynson W. H. Lau
Abstract:
Dynamic 3D interaction has witnessed great interest in recent works, while creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, and the other is to learn the deformation of static 3D objects with the distillation of video generative models. The former one requires assigning precise physical properties to the target object, otherwise the…
▽ More
Dynamic 3D interaction has witnessed great interest in recent works, while creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, and the other is to learn the deformation of static 3D objects with the distillation of video generative models. The former one requires assigning precise physical properties to the target object, otherwise the simulated results would become unnatural. The latter tends to formulate the video with minor motions and discontinuous frames, due to the absence of physical constraints in deformation learning. We think that video generative models are trained with real-world captured data, capable of judging physical phenomenon in simulation environments. To this end, we propose DreamPhysics in this work, which estimates physical properties of 3D Gaussian Splatting with video diffusion priors. DreamPhysics supports both image- and text-conditioned guidance, optimizing physical parameters via score distillation sampling with frame interpolation and log gradient. Based on a material point method simulator with proper physical parameters, our method can generate 4D content with realistic motions. Experimental results demonstrate that, by distilling the prior knowledge of video diffusion models, inaccurate physical properties can be gradually refined for high-quality simulation. Codes are released at: https://github.com/tyhuang0428/DreamPhysics.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Color Shift Estimation-and-Correction for Image Enhancement
Authors:
Yiyu Li,
Ke Xu,
Gerhard Petrus Hancke,
Rynson W. H. Lau
Abstract:
Images captured under sub-optimal illumination conditions may contain both over- and under-exposures. Current approaches mainly focus on adjusting image brightness, which may exacerbate the color tone distortion in under-exposed areas and fail to restore accurate colors in over-exposed regions. We observe that over- and under-exposed regions display opposite color tone distribution shifts with res…
▽ More
Images captured under sub-optimal illumination conditions may contain both over- and under-exposures. Current approaches mainly focus on adjusting image brightness, which may exacerbate the color tone distortion in under-exposed areas and fail to restore accurate colors in over-exposed regions. We observe that over- and under-exposed regions display opposite color tone distribution shifts with respect to each other, which may not be easily normalized in joint modeling as they usually do not have ``normal-exposed'' regions/pixels as reference. In this paper, we propose a novel method to enhance images with both over- and under-exposures by learning to estimate and correct such color shifts. Specifically, we first derive the color feature maps of the brightened and darkened versions of the input image via a UNet-based network, followed by a pseudo-normal feature generator to produce pseudo-normal color feature maps. We then propose a novel COlor Shift Estimation (COSE) module to estimate the color shifts between the derived brightened (or darkened) color feature maps and the pseudo-normal color feature maps. The COSE module corrects the estimated color shifts of the over- and under-exposed regions separately. We further propose a novel COlor MOdulation (COMO) module to modulate the separately corrected colors in the over- and under-exposed regions to produce the enhanced image. Comprehensive experiments show that our method outperforms existing approaches. Project webpage: https://github.com/yiyulics/CSEC.
△ Less
Submitted 29 May, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition
Authors:
Kin Wai Lau,
Yasar Abbas Ur Rehman,
Lai-Man Po
Abstract:
Recent research has successfully adapted vision-based convolutional neural network (CNN) architectures for audio recognition tasks using Mel-Spectrograms. However, these CNNs have high computational costs and memory requirements, limiting their deployment on low-end edge devices. Motivated by the success of efficient vision models like InceptionNeXt and ConvNeXt, we propose AudioRepInceptionNeXt,…
▽ More
Recent research has successfully adapted vision-based convolutional neural network (CNN) architectures for audio recognition tasks using Mel-Spectrograms. However, these CNNs have high computational costs and memory requirements, limiting their deployment on low-end edge devices. Motivated by the success of efficient vision models like InceptionNeXt and ConvNeXt, we propose AudioRepInceptionNeXt, a single-stream architecture. Its basic building block breaks down the parallel multi-branch depth-wise convolutions with descending scales of k x k kernels into a cascade of two multi-branch depth-wise convolutions. The first multi-branch consists of parallel multi-scale 1 x k depth-wise convolutional layers followed by a similar multi-branch employing parallel multi-scale k x 1 depth-wise convolutional layers. This reduces computational and memory footprint while separating time and frequency processing of Mel-Spectrograms. The large kernels capture global frequencies and long activities, while small kernels get local frequencies and short activities. We also reparameterize the multi-branch design during inference to further boost speed without losing accuracy. Experiments show that AudioRepInceptionNeXt reduces parameters and computations by 50%+ and improves inference speed 1.28x over state-of-the-art CNNs like the Slow-Fast while maintaining comparable accuracy. It also learns robustly across a variety of audio recognition tasks. Codes are available at https://github.com/StevenLauHKHK/AudioRepInceptionNeXt.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Optimal Resource Efficiency with Fairness in Heterogeneous GPU Clusters
Authors:
Zizhao Mo,
Huanle Xu,
Wing Cheong Lau
Abstract:
Ensuring the highest training throughput to maximize resource efficiency, while maintaining fairness among users, is critical for deep learning (DL) training in heterogeneous GPU clusters. However, current DL schedulers provide only limited fairness properties and suboptimal training throughput, impeding tenants from effectively leveraging heterogeneous resources. The underlying design challenge s…
▽ More
Ensuring the highest training throughput to maximize resource efficiency, while maintaining fairness among users, is critical for deep learning (DL) training in heterogeneous GPU clusters. However, current DL schedulers provide only limited fairness properties and suboptimal training throughput, impeding tenants from effectively leveraging heterogeneous resources. The underlying design challenge stems from inherent conflicts between efficiency and fairness properties.
In this paper, we introduce OEF, a new resource allocation framework specifically developed for achieving optimal resource efficiency and ensuring diverse fairness properties in heterogeneous GPU clusters. By integrating resource efficiency and fairness within a global optimization framework, OEF is capable of providing users with maximized overall efficiency, as well as various guarantees of fairness, in both cooperative and non-cooperative environments. We have implemented OEF in a cluster resource manager and conducted large-scale experiments, showing that OEF can improve the overall training throughput by up to 32% while improving fairness compared to state-of-the-art heterogeneity-aware schedulers.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields
Authors:
Haoyuan Wang,
Wenbo Hu,
Lei Zhu,
Rynson W. H. Lau
Abstract:
Inverse rendering aims at recovering both geometry and materials of objects. It provides a more compatible reconstruction for conventional rendering engines, compared with the neural radiance fields (NeRFs). On the other hand, existing NeRF-based inverse rendering methods cannot handle glossy objects with local light interactions well, as they typically oversimplify the illumination as a 2D enviro…
▽ More
Inverse rendering aims at recovering both geometry and materials of objects. It provides a more compatible reconstruction for conventional rendering engines, compared with the neural radiance fields (NeRFs). On the other hand, existing NeRF-based inverse rendering methods cannot handle glossy objects with local light interactions well, as they typically oversimplify the illumination as a 2D environmental map, which assumes infinite lights only. Observing the superiority of NeRFs in recovering radiance fields, we propose a novel 5D Neural Plenoptic Function (NeP) based on NeRFs and ray tracing, such that more accurate lighting-object interactions can be formulated via the rendering equation. We also design a material-aware cone sampling strategy to efficiently integrate lights inside the BRDF lobes with the help of pre-filtered radiance fields. Our method has two stages: the geometry of the target object and the pre-filtered environmental radiance fields are reconstructed in the first stage, and materials of the target object are estimated in the second stage with the proposed NeP and material-aware cone sampling strategy. Extensive experiments on the proposed real-world and synthetic datasets demonstrate that our method can reconstruct high-fidelity geometry/materials of challenging glossy objects with complex lighting interactions from nearby objects. Project webpage: https://whyy.site/paper/nep
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars
Authors:
Zhenwei Wang,
Tengfei Wang,
Gerhard Hancke,
Ziwei Liu,
Rynson W. H. Lau
Abstract:
Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing customized 3D assets following the shared theme of input 3D exemplars remains an open and challenging problem. In this work, we present ThemeStation, a novel approach for theme-aware 3D-to-3D gener…
▽ More
Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing customized 3D assets following the shared theme of input 3D exemplars remains an open and challenging problem. In this work, we present ThemeStation, a novel approach for theme-aware 3D-to-3D generation. ThemeStation synthesizes customized 3D assets based on given few exemplars with two goals: 1) unity for generating 3D assets that thematically align with the given exemplars and 2) diversity for generating 3D assets with a high degree of variations. To this end, we design a two-stage framework that draws a concept image first, followed by a reference-informed 3D modeling stage. We propose a novel dual score distillation (DSD) loss to jointly leverage priors from both the input exemplars and the synthesized concept image. Extensive experiments and user studies confirm that ThemeStation surpasses prior works in producing diverse theme-aware 3D models with impressive quality. ThemeStation also enables various applications such as controllable 3D-to-3D generation.
△ Less
Submitted 15 May, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
Authors:
Yuhao Liu,
Zhanghan Ke,
Fang Liu,
Nanxuan Zhao,
Rynson W. H. Lau
Abstract:
Diffusion models trained on large-scale datasets have achieved remarkable progress in image synthesis. However, due to the randomness in the diffusion process, they often struggle with handling diverse low-level tasks that require details preservation. To overcome this limitation, we present a new Diff-Plugin framework to enable a single pre-trained diffusion model to generate high-fidelity result…
▽ More
Diffusion models trained on large-scale datasets have achieved remarkable progress in image synthesis. However, due to the randomness in the diffusion process, they often struggle with handling diverse low-level tasks that require details preservation. To overcome this limitation, we present a new Diff-Plugin framework to enable a single pre-trained diffusion model to generate high-fidelity results across a variety of low-level tasks. Specifically, we first propose a lightweight Task-Plugin module with a dual branch design to provide task-specific priors, guiding the diffusion process in preserving image content. We then propose a Plugin-Selector that can automatically select different Task-Plugins based on the text instruction, allowing users to edit images by indicating multiple low-level tasks with natural language. We conduct extensive experiments on 8 low-level vision tasks. The results demonstrate the superiority of Diff-Plugin over existing methods, particularly in real-world scenarios. Our ablations further validate that Diff-Plugin is stable, schedulable, and supports robust training across different dataset sizes.
△ Less
Submitted 28 May, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
RelayAttention for Efficient Large Language Model Serving with Long System Prompts
Authors:
Lei Zhu,
Xinjiang Wang,
Wayne Zhang,
Rynson W. H. Lau
Abstract:
A practical large language model (LLM) service may involve a long system prompt, which specifies the instructions, examples, and knowledge documents of the task and is reused across requests. However, the long system prompt causes throughput/latency bottlenecks as the cost of generating the next token grows w.r.t. the sequence length. This paper aims to improve the efficiency of LLM services that…
▽ More
A practical large language model (LLM) service may involve a long system prompt, which specifies the instructions, examples, and knowledge documents of the task and is reused across requests. However, the long system prompt causes throughput/latency bottlenecks as the cost of generating the next token grows w.r.t. the sequence length. This paper aims to improve the efficiency of LLM services that involve long system prompts. Our key observation is that handling these system prompts requires heavily redundant memory accesses in existing causal attention computation algorithms. Specifically, for batched requests, the cached hidden states (\ie, key-value pairs) of system prompts are transferred from off-chip DRAM to on-chip SRAM multiple times, each corresponding to an individual request. To eliminate such a redundancy, we propose RelayAttention, an attention algorithm that allows reading these hidden states from DRAM exactly once for a batch of input tokens. RelayAttention is a free lunch: it maintains the generation quality while requiring no model retraining, as it is based on a mathematical reformulation of causal attention. We have observed significant performance improvements to a production-level system, vLLM, through integration with RelayAttention. The improvements are even more profound with longer system prompts.
△ Less
Submitted 30 May, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Delving into Dark Regions for Robust Shadow Detection
Authors:
Huankang Guan,
Ke Xu,
Rynson W. H. Lau
Abstract:
Shadow detection is a challenging task as it requires a comprehensive understanding of shadow characteristics and global/local illumination conditions. We observe from our experiment that state-of-the-art deep methods tend to have higher error rates in differentiating shadow pixels from non-shadow pixels in dark regions (ie, regions with low-intensity values). Our key insight to this problem is th…
▽ More
Shadow detection is a challenging task as it requires a comprehensive understanding of shadow characteristics and global/local illumination conditions. We observe from our experiment that state-of-the-art deep methods tend to have higher error rates in differentiating shadow pixels from non-shadow pixels in dark regions (ie, regions with low-intensity values). Our key insight to this problem is that existing methods typically learn discriminative shadow features from the whole image globally, covering the full range of intensity values, and may not learn the subtle differences between shadow and non-shadow pixels in dark regions. Hence, if we can design a model to focus on a narrower range of low-intensity regions, it may be able to learn better discriminative features for shadow detection. Inspired by this insight, we propose a novel shadow detection approach that first learns global contextual cues over the entire image and then zooms into the dark regions to learn local shadow representations. To this end, we formulate an effective dark-region recommendation (DRR) module to recommend regions of low-intensity values, and a novel dark-aware shadow analysis (DASA) module to learn dark-aware shadow features from the recommended dark regions. Extensive experiments show that the proposed method outperforms the state-of-the-art methods on three popular shadow detection datasets. Code is available at https://github.com/guanhuankang/ShadowDetection2021.git.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Exploring Federated Self-Supervised Learning for General Purpose Audio Understanding
Authors:
Yasar Abbas Ur Rehman,
Kin Wai Lau,
Yuyang Xie,
Lan Ma,
Jiajun Shen
Abstract:
The integration of Federated Learning (FL) and Self-supervised Learning (SSL) offers a unique and synergetic combination to exploit the audio data for general-purpose audio understanding, without compromising user data privacy. However, rare efforts have been made to investigate the SSL models in the FL regime for general-purpose audio understanding, especially when the training data is generated…
▽ More
The integration of Federated Learning (FL) and Self-supervised Learning (SSL) offers a unique and synergetic combination to exploit the audio data for general-purpose audio understanding, without compromising user data privacy. However, rare efforts have been made to investigate the SSL models in the FL regime for general-purpose audio understanding, especially when the training data is generated by large-scale heterogeneous audio sources. In this paper, we evaluate the performance of feature-matching and predictive audio-SSL techniques when integrated into large-scale FL settings simulated with non-independently identically distributed (non-iid) data. We propose a novel Federated SSL (F-SSL) framework, dubbed FASSL, that enables learning intermediate feature representations from large-scale decentralized heterogeneous clients, holding unlabelled audio data. Our study has found that audio F-SSL approaches perform on par with the centralized audio-SSL approaches on the audio-retrieval task. Extensive experiments demonstrate the effectiveness and significance of FASSL as it assists in obtaining the optimal global model for state-of-the-art FL aggregation methods.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Recasting Regional Lighting for Shadow Removal
Authors:
Yuhao Liu,
Zhanghan Ke,
Ke Xu,
Fang Liu,
Zhenwei Wang,
Rynson W. H. Lau
Abstract:
Removing shadows requires an understanding of both lighting conditions and object textures in a scene. Existing methods typically learn pixel-level color map**s between shadow and non-shadow images, in which the joint modeling of lighting and object textures is implicit and inadequate. We observe that in a shadow region, the degradation degree of object textures depends on the local illumination…
▽ More
Removing shadows requires an understanding of both lighting conditions and object textures in a scene. Existing methods typically learn pixel-level color map**s between shadow and non-shadow images, in which the joint modeling of lighting and object textures is implicit and inadequate. We observe that in a shadow region, the degradation degree of object textures depends on the local illumination, while simply enhancing the local illumination cannot fully recover the attenuated textures. Based on this observation, we propose to condition the restoration of attenuated textures on the corrected local lighting in the shadow region. Specifically, We first design a shadow-aware decomposition network to estimate the illumination and reflectance layers of shadow regions explicitly. We then propose a novel bilateral correction network to recast the lighting of shadow regions in the illumination layer via a novel local lighting correction module, and to restore the textures conditioned on the corrected illumination layer via a novel illumination-guided texture restoration module. We further annotate pixel-wise shadow masks for the public SRD dataset, which originally contains only image pairs. Experiments on three benchmarks show that our method outperforms existing state-of-the-art shadow removal methods.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Information Inequalities via Ideas from Additive Combinatorics
Authors:
Chin Wa Lau,
Chandra Nair
Abstract:
Ruzsa's equivalence theorem provided a framework for converting certain families of inequalities in additive combinatorics to entropic inequalities (which sometimes did not possess stand-alone entropic proofs). In this work, we first establish formal equivalences between some families (different from Ruzsa) of inequalities in additive combinatorics and entropic ones. As a first step to further the…
▽ More
Ruzsa's equivalence theorem provided a framework for converting certain families of inequalities in additive combinatorics to entropic inequalities (which sometimes did not possess stand-alone entropic proofs). In this work, we first establish formal equivalences between some families (different from Ruzsa) of inequalities in additive combinatorics and entropic ones. As a first step to further these equivalences, we establish an information-theoretic characterization of the magnification ratio that could also be of independent interest.
△ Less
Submitted 20 December, 2023; v1 submitted 18 December, 2023;
originally announced December 2023.
-
DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
Authors:
Tianyu Huang,
Yihan Zeng,
Zhilu Zhang,
Wan Xu,
Hang Xu,
Songcen Xu,
Rynson W. H. Lau,
Wangmeng Zuo
Abstract:
3D generation has raised great attention in recent years. With the success of text-to-image diffusion models, the 2D-lifting technique becomes a promising route to controllable 3D generation. However, these methods tend to present inconsistent geometry, which is also known as the Janus problem. We observe that the problem is caused mainly by two aspects, i.e., viewpoint bias in 2D diffusion models…
▽ More
3D generation has raised great attention in recent years. With the success of text-to-image diffusion models, the 2D-lifting technique becomes a promising route to controllable 3D generation. However, these methods tend to present inconsistent geometry, which is also known as the Janus problem. We observe that the problem is caused mainly by two aspects, i.e., viewpoint bias in 2D diffusion models and overfitting of the optimization objective. To address it, we propose a two-stage 2D-lifting framework, namely DreamControl, which optimizes coarse NeRF scenes as 3D self-prior and then generates fine-grained objects with control-based score distillation. Specifically, adaptive viewpoint sampling and boundary integrity metric are proposed to ensure the consistency of generated priors. The priors are then regarded as input conditions to maintain reasonable geometries, in which conditional LoRA and weighted score are further proposed to optimize detailed textures. DreamControl can generate high-quality 3D content in terms of both geometry consistency and texture fidelity. Moreover, our control-based optimization guidance is applicable to more downstream tasks, including user-guided generation and 3D animation. The project page is available at https://github.com/tyhuang0428/DreamControl.
△ Less
Submitted 12 March, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Mitigating Nonlinear Algorithmic Bias in Binary Classification
Authors:
Wendy Hui,
Wai Kwong Lau
Abstract:
This paper proposes the use of causal modeling to detect and mitigate algorithmic bias that is nonlinear in the protected attribute. We provide a general overview of our approach. We use the German Credit data set, which is available for download from the UC Irvine Machine Learning Repository, to develop (1) a prediction model, which is treated as a black box, and (2) a causal model for bias mitig…
▽ More
This paper proposes the use of causal modeling to detect and mitigate algorithmic bias that is nonlinear in the protected attribute. We provide a general overview of our approach. We use the German Credit data set, which is available for download from the UC Irvine Machine Learning Repository, to develop (1) a prediction model, which is treated as a black box, and (2) a causal model for bias mitigation. In this paper, we focus on age bias and the problem of binary classification. We show that the probability of getting correctly classified as "low risk" is lowest among young people. The probability increases with age nonlinearly. To incorporate the nonlinearity into the causal model, we introduce a higher order polynomial term. Based on the fitted causal model, the de-biased probability estimates are computed, showing improved fairness with little impact on overall classification accuracy. Causal modeling is intuitive and, hence, its use can enhance explicability and promotes trust among different stakeholders of AI.
△ Less
Submitted 7 May, 2024; v1 submitted 8 December, 2023;
originally announced December 2023.
-
[OIII] 5007 emissions in extremely red quasars (ERQs) are compact
Authors:
Marie Wingyee Lau,
Serena Perrotta,
Fred Hamann,
Jarred Gillette,
David S. N. Rupke,
Andrey Vayner,
Nadia L. Zakamska,
Dominika Wylezalek
Abstract:
``Extremely red quasars'' (ERQs) are a non-radio-selected, intrinsically luminous population of quasars at cosmic noon selected by their extremely red colour from rest-frame UV to mid-IR. ERQs are uniquely associated with exceptionally broad and blueshifted [OIII] 5007 emission reaching speeds >6000 km s^-1. We obtained adaptive optics integral-field spectroscopic observations using Keck/OSIRIS an…
▽ More
``Extremely red quasars'' (ERQs) are a non-radio-selected, intrinsically luminous population of quasars at cosmic noon selected by their extremely red colour from rest-frame UV to mid-IR. ERQs are uniquely associated with exceptionally broad and blueshifted [OIII] 5007 emission reaching speeds >6000 km s^-1. We obtained adaptive optics integral-field spectroscopic observations using Keck/OSIRIS and Gemini/NIFS of a sample of 10 ERQs with bolometric luminosities (10^47.0-10^47.9) erg s^-1 at z ~(2.3-3.0). The goal is to measure the sizes and spatially-resolved kinematics of the [OIII]-emitting regions. We study the surface brightness maps and aperture-extracted spectra and model the point-spread functions. We identify signs of merger activities in the continuum emissions. We identify physically distinct [OIII] kinematic components that are bimodal and respectively trace ERQ-driven outflows of velocity dispersion >250 km s^-1 and dynamically quiescent interstellar media. We find that the ERQ-driven ionized outflows are typically at ~1 kpc scales whereas the quiescent ionized gas extend to a few kpc. Compared to normal quasars the extremely fast ERQ-driven [OIII] outflows tend to be more compact, supporting the notion that ERQs are in a young stage of quasar/galaxy evolution and represent systems with unique physical conditions beyond orientation differences with normal quasar populations. The kinematically quiescent [OIII] emissions in ERQs tend to be spatially-resolved but less extended than in normal quasars, which can be explained by global and patchy dust obscuration. The hint of ionization cones suggests some of the obscuration can be partially explained by a patchy torus.
△ Less
Submitted 27 June, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
Adaptive Uncertainty Estimation via High-Dimensional Testing on Latent Representations
Authors:
Tsai Hor Chan,
Kin Wai Lau,
Jiajun Shen,
Guosheng Yin,
Lequan Yu
Abstract:
Uncertainty estimation aims to evaluate the confidence of a trained deep neural network. However, existing uncertainty estimation approaches rely on low-dimensional distributional assumptions and thus suffer from the high dimensionality of latent features. Existing approaches tend to focus on uncertainty on discrete classification probabilities, which leads to poor generalizability to uncertainty…
▽ More
Uncertainty estimation aims to evaluate the confidence of a trained deep neural network. However, existing uncertainty estimation approaches rely on low-dimensional distributional assumptions and thus suffer from the high dimensionality of latent features. Existing approaches tend to focus on uncertainty on discrete classification probabilities, which leads to poor generalizability to uncertainty estimation for other tasks. Moreover, most of the literature requires seeing the out-of-distribution (OOD) data in the training for better estimation of uncertainty, which limits the uncertainty estimation performance in practice because the OOD data are typically unseen. To overcome these limitations, we propose a new framework using data-adaptive high-dimensional hypothesis testing for uncertainty estimation, which leverages the statistical properties of the feature representations. Our method directly operates on latent representations and thus does not require retraining the feature encoder under a modified objective. The test statistic relaxes the feature distribution assumptions to high dimensionality, and it is more discriminative to uncertainties in the latent representations. We demonstrate that encoding features with Bayesian neural networks can enhance testing performance and lead to more accurate uncertainty estimation. We further introduce a family-wise testing procedure to determine the optimal threshold of OOD detection, which minimizes the false discovery rate (FDR). Extensive experiments validate the satisfactory performance of our framework on uncertainty estimation and task-specific prediction over a variety of competitors. The experiments on the OOD detection task also show satisfactory performance of our method when the OOD data are unseen in the training. Codes are available at https://github.com/HKU-MedAI/bnn_uncertainty.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Detecting and Mitigating Algorithmic Bias in Binary Classification using Causal Modeling
Authors:
Wendy Hui,
Wai Kwong Lau
Abstract:
This paper proposes the use of causal modeling to detect and mitigate algorithmic bias. We provide a brief description of causal modeling and a general overview of our approach. We then use the Adult dataset, which is available for download from the UC Irvine Machine Learning Repository, to develop (1) a prediction model, which is treated as a black box, and (2) a causal model for bias mitigation.…
▽ More
This paper proposes the use of causal modeling to detect and mitigate algorithmic bias. We provide a brief description of causal modeling and a general overview of our approach. We then use the Adult dataset, which is available for download from the UC Irvine Machine Learning Repository, to develop (1) a prediction model, which is treated as a black box, and (2) a causal model for bias mitigation. In this paper, we focus on gender bias and the problem of binary classification. We show that gender bias in the prediction model is statistically significant at the 0.05 level. We demonstrate the effectiveness of the causal model in mitigating gender bias by cross-validation. Furthermore, we show that the overall classification accuracy is improved slightly. Our novel approach is intuitive, easy-to-use, and can be implemented using existing statistical software tools such as "lavaan" in R. Hence, it enhances explainability and promotes trust.
△ Less
Submitted 8 November, 2023; v1 submitted 18 October, 2023;
originally announced October 2023.
-
TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields
Authors:
Tianyu Huang,
Yihan Zeng,
Bowen Dong,
Hang Xu,
Songcen Xu,
Rynson W. H. Lau,
Wangmeng Zuo
Abstract:
Recent works learn 3D representation explicitly under text-3D guidance. However, limited text-3D data restricts the vocabulary scale and text control of generations. Generators may easily fall into a stereotype concept for certain text prompts, thus losing open-vocabulary generation ability. To tackle this issue, we introduce a conditional 3D generative model, namely TextField3D. Specifically, rat…
▽ More
Recent works learn 3D representation explicitly under text-3D guidance. However, limited text-3D data restricts the vocabulary scale and text control of generations. Generators may easily fall into a stereotype concept for certain text prompts, thus losing open-vocabulary generation ability. To tackle this issue, we introduce a conditional 3D generative model, namely TextField3D. Specifically, rather than using the text prompts as input directly, we suggest to inject dynamic noise into the latent space of given text prompts, i.e., Noisy Text Fields (NTFs). In this way, limited 3D data can be mapped to the appropriate range of textual latent space that is expanded by NTFs. To this end, an NTFGen module is proposed to model general text latent code in noisy fields. Meanwhile, an NTFBind module is proposed to align view-invariant image latent code to noisy fields, further supporting image-conditional 3D generation. To guide the conditional generation in both geometry and texture, multi-modal discrimination is constructed with a text-3D discriminator and a text-2.5D discriminator. Compared to previous methods, TextField3D includes three merits: 1) large vocabulary, 2) text consistency, and 3) low latency. Extensive experiments demonstrate that our method achieves a potential open-vocabulary 3D generation capability.
△ Less
Submitted 14 March, 2024; v1 submitted 29 September, 2023;
originally announced September 2023.
-
WTH! Wok the Hydrogen: Measurement of Galactic Neutral Hydrogen in Noisy Urban Environment Using Kitchenware
Authors:
Leo W. H. Fung,
Albert Wai Kit Lau,
Ka Hung Chan,
Ming Tony Shing
Abstract:
Astronomy observation is difficult in urban environments due to the background noise generated by human activities. Consequently, promoting astronomy in metropolitan areas is challenging. In this work, we propose a low-cost, educational experiment called Wok the Hydrogen (WTH) that offers opportunities for scientific observation in urban environments, specifically the observation of the $21$ cm (…
▽ More
Astronomy observation is difficult in urban environments due to the background noise generated by human activities. Consequently, promoting astronomy in metropolitan areas is challenging. In this work, we propose a low-cost, educational experiment called Wok the Hydrogen (WTH) that offers opportunities for scientific observation in urban environments, specifically the observation of the $21$ cm ($f_{21} = 1420.4$ MHz) emission from neutral hydrogen in the Milky Way. We demonstrate how to construct a radio telescope using kitchenware, along with additional electronic equipment that can be easily purchased online. The total system cost is controlled within 150 dollars. We also outline the subsequent data analysis procedures for deriving the recession velocity of galactic hydrogen from the raw data. The system was tested on the campus of the Hong Kong University of Science and Technology, which is located approximately 2 km northeast of the nearest residential area with a population of 0.4 million and about 10 km east of the downtown area with a population of 2 million. We show that a precision of $Δv \approx \pm 20$ km s$^{-1}$ can be achieved for determining the recession velocity of neutral hydrogen with this relatively simple setup, and the precision can be further improved with longer exposure time.
△ Less
Submitted 28 September, 2023; v1 submitted 26 September, 2023;
originally announced September 2023.
-
Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN
Authors:
Kin Wai Lau,
Lai-Man Po,
Yasar Abbas Ur Rehman
Abstract:
Visual Attention Networks (VAN) with Large Kernel Attention (LKA) modules have been shown to provide remarkable performance, that surpasses Vision Transformers (ViTs), on a range of vision-based tasks. However, the depth-wise convolutional layer in these LKA modules incurs a quadratic increase in the computational and memory footprints with increasing convolutional kernel size. To mitigate these p…
▽ More
Visual Attention Networks (VAN) with Large Kernel Attention (LKA) modules have been shown to provide remarkable performance, that surpasses Vision Transformers (ViTs), on a range of vision-based tasks. However, the depth-wise convolutional layer in these LKA modules incurs a quadratic increase in the computational and memory footprints with increasing convolutional kernel size. To mitigate these problems and to enable the use of extremely large convolutional kernels in the attention modules of VAN, we propose a family of Large Separable Kernel Attention modules, termed LSKA. LSKA decomposes the 2D convolutional kernel of the depth-wise convolutional layer into cascaded horizontal and vertical 1-D kernels. In contrast to the standard LKA design, the proposed decomposition enables the direct use of the depth-wise convolutional layer with large kernels in the attention module, without requiring any extra blocks. We demonstrate that the proposed LSKA module in VAN can achieve comparable performance with the standard LKA module and incur lower computational complexity and memory footprints. We also find that the proposed LSKA design biases the VAN more toward the shape of the object than the texture with increasing kernel size. Additionally, we benchmark the robustness of the LKA and LSKA in VAN, ViTs, and the recent ConvNeXt on the five corrupted versions of the ImageNet dataset that are largely unexplored in the previous works. Our extensive experimental results show that the proposed LSKA module in VAN provides a significant reduction in computational complexity and memory footprints with increasing kernel size while outperforming ViTs, ConvNeXt, and providing similar performance compared to the LKA module in VAN on object recognition, object detection, semantic segmentation, and robustness tests.
△ Less
Submitted 19 October, 2023; v1 submitted 4 September, 2023;
originally announced September 2023.
-
Language-based Photo Color Adjustment for Graphic Designs
Authors:
Zhenwei Wang,
Nanxuan Zhao,
Gerhard Hancke,
Rynson W. H. Lau
Abstract:
Adjusting the photo color to associate with some design elements is an essential way for a graphic design to effectively deliver its message and make it aesthetically pleasing. However, existing tools and previous works face a dilemma between the ease of use and level of expressiveness. To this end, we introduce an interactive language-based approach for photo recoloring, which provides an intuiti…
▽ More
Adjusting the photo color to associate with some design elements is an essential way for a graphic design to effectively deliver its message and make it aesthetically pleasing. However, existing tools and previous works face a dilemma between the ease of use and level of expressiveness. To this end, we introduce an interactive language-based approach for photo recoloring, which provides an intuitive system that can assist both experts and novices on graphic design. Given a graphic design containing a photo that needs to be recolored, our model can predict the source colors and the target regions, and then recolor the target regions with the source colors based on the given language-based instruction. The multi-granularity of the instruction allows diverse user intentions. The proposed novel task faces several unique challenges, including: 1) color accuracy for recoloring with exactly the same color from the target design element as specified by the user; 2) multi-granularity instructions for parsing instructions correctly to generate a specific result or multiple plausible ones; and 3) locality for recoloring in semantically meaningful local regions to preserve original image semantics. To address these challenges, we propose a model called LangRecol with two main components: the language-based source color prediction module and the semantic-palette-based photo recoloring module. We also introduce an approach for generating a synthetic graphic design dataset with instructions to enable model training. We evaluate our model via extensive experiments and user studies. We also discuss several practical applications, showing the effectiveness and practicality of our approach. Code and data for this paper are at: https://zhenwwang.github.io/langrecol.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
Lighting up NeRF via Unsupervised Decomposition and Enhancement
Authors:
Haoyuan Wang,
Xiaogang Xu,
Ke Xu,
Rynson WH. Lau
Abstract:
Neural Radiance Field (NeRF) is a promising approach for synthesizing novel views, given a set of images and the corresponding camera poses of a scene. However, images photographed from a low-light scene can hardly be used to train a NeRF model to produce high-quality results, due to their low pixel intensities, heavy noise, and color distortion. Combining existing low-light image enhancement meth…
▽ More
Neural Radiance Field (NeRF) is a promising approach for synthesizing novel views, given a set of images and the corresponding camera poses of a scene. However, images photographed from a low-light scene can hardly be used to train a NeRF model to produce high-quality results, due to their low pixel intensities, heavy noise, and color distortion. Combining existing low-light image enhancement methods with NeRF methods also does not work well due to the view inconsistency caused by the individual 2D enhancement process. In this paper, we propose a novel approach, called Low-Light NeRF (or LLNeRF), to enhance the scene representation and synthesize normal-light novel views directly from sRGB low-light images in an unsupervised manner. The core of our approach is a decomposition of radiance field learning, which allows us to enhance the illumination, reduce noise and correct the distorted colors jointly with the NeRF optimization process. Our method is able to produce novel view images with proper lighting and vivid colors and details, given a collection of camera-finished low dynamic range (8-bits/channel) images from a low-light scene. Experiments demonstrate that our method outperforms existing low-light enhancement methods and NeRF methods.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023
Authors:
Kin Wai Lau,
Yasar Abbas Ur Rehman,
Yuyang Xie,
Lan Ma
Abstract:
This report presents the technical details of our submission to the 2023 Epic-Kitchen EPIC-SOUNDS Audio-Based Interaction Recognition Challenge. The task is to learn the map** from audio samples to their corresponding action labels. To achieve this goal, we propose a simple yet effective single-stream CNN-based architecture called AudioInceptionNeXt that operates on the time-frequency log-mel-sp…
▽ More
This report presents the technical details of our submission to the 2023 Epic-Kitchen EPIC-SOUNDS Audio-Based Interaction Recognition Challenge. The task is to learn the map** from audio samples to their corresponding action labels. To achieve this goal, we propose a simple yet effective single-stream CNN-based architecture called AudioInceptionNeXt that operates on the time-frequency log-mel-spectrogram of the audio samples. Motivated by the design of the InceptionNeXt, we propose parallel multi-scale depthwise separable convolutional kernels in the AudioInceptionNeXt block, which enable the model to learn the time and frequency information more effectively. The large-scale separable kernels capture the long duration of activities and the global frequency semantic information, while the small-scale separable kernels capture the short duration of activities and local details of frequency information. Our approach achieved 55.43% of top-1 accuracy on the challenge test set, ranked as 1st on the public leaderboard. Codes are available anonymously at https://github.com/StevenLauHKHK/AudioInceptionNeXt.git.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Accurate Systemic Redshifts and Outflow Speeds for Extremely Red Quasars (ERQs)
Authors:
Jarred Gillette,
Fred Hamann,
Marie Wingyee Lau,
Serena Perrotta
Abstract:
Extremely Red Quasars (ERQs) are thought to represent a brief episode of young quasar and galactic evolution characterized by rapid outflows and obscured growth due to dusty environments. We use new redshift measurements from CO and Ly$α$ emission-lines to better constrain outflow velocities from previous line measurements. We present sample of 82 ERQs, and the analysis confirms that ERQs have a h…
▽ More
Extremely Red Quasars (ERQs) are thought to represent a brief episode of young quasar and galactic evolution characterized by rapid outflows and obscured growth due to dusty environments. We use new redshift measurements from CO and Ly$α$ emission-lines to better constrain outflow velocities from previous line measurements. We present sample of 82 ERQs, and the analysis confirms that ERQs have a higher incidence of large CIV blueshifts, accompanied by large Rest Equivalent Widths (REWs) and smaller line widths than blue quasars. We find that strong blueshifts (>2000 km s$^{-1}$) are present in 12/54 (22.22 per cent) of ERQs with the most robust redshift indicators. At least 4 out of 15 ERQs in the sample also have blueshifts in their H$β$ and low-ionization UV lines ranging from $-$500 to $-$1500 km s$^{-1}$. ERQs with strong CIV blueshifts are substantially offset in CIV REW and Full-Width at Half-Maximum (FWHM) from typical blue quasars in the same velocity range. ERQs have average values of REW = 124 A and FWHM = 5274 km s$^{-1}$, while blue quasars have REW = 24 A and FWHM = 6973 km s$^{-1}$. The extreme nature of the outflows in ERQs might explain some of their other spectral properties, such as the large CIV REWs and peculiar wingless profiles owing to more extended broad-line regions participating in outflows. The physical reasons for the extreme outflow properties of ERQs are unclear; however, larger Eddington ratios and/or softer ionizing spectra incident on the outflow gas cannot be ruled out.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Initial On-Sky Performance testing of the Single-Photon Imager for Nanosecond Astrophysics (SPINA) system
Authors:
Albert Wai Kit Lau,
Nurzhan Shaimoldin,
Zhanat Maksut,
Yan Yan Chan,
Mehdi Shafiee,
Bruce Grossan,
George F. Smoot
Abstract:
This work presents an initial on-sky performance measurement of the Single-Photon Imager for Nanosecond Astrophysics (SPINA) system, part of our Ultra-Fast Astronomy (UFA) program. We developed the SPINA system based on the position-sensitive silicon photomultiplier (PS-SiPM) detector to record both photoelectron (P.E.) temporal and spatial information. The initial on-sky testing of the SPINA syst…
▽ More
This work presents an initial on-sky performance measurement of the Single-Photon Imager for Nanosecond Astrophysics (SPINA) system, part of our Ultra-Fast Astronomy (UFA) program. We developed the SPINA system based on the position-sensitive silicon photomultiplier (PS-SiPM) detector to record both photoelectron (P.E.) temporal and spatial information. The initial on-sky testing of the SPINA system was successfully performed on UT 2022 Jul 10, on the 0.7-meter aperture Nazarbayev University Transient Telescope at the Assy-Turgen Astrophysical Observatory (NUTTelA-TAO). We measured stars with a wide range of brightness and a dark region of the sky without stars $< 18$ mag. We measured the SPINA system's spatial resolution to be $<232μm$ (full-width half-maximum, FWHM), limited by the unstable atmosphere. We measured the total background noise (detector dark counts and sky background) of 1914 counts per second (cps) within this resolution element. We also performed a crosstalk map** of the detector, obtaining the crosstalk probability of $\sim0.18$ near the detector's center while reaching $\sim 50\%$ at the edges. We derived a $5σ$ sensitivity of $17.45$ Gaia-BP magnitude in a 1s exposure with no atmospheric extinction by comparing the received flux with Gaia-BP band data. For a $10ms$ window and a false alarm rate of once per 100 nights, we derived a transient sensitivity of 14.06 mag. For a $1μs$ or faster time scale, we are limited by crosstalk to a 15 P.E. detection threshold. In addition, we demonstrated that the SPINA system is capable of capturing changes in the stellar profile FWHM of $\pm1.8\%$ and $\pm5\%$ change in the stellar profile FWHM in $20ms$ and $2ms$ exposures, respectively, as well as capturing stellar light curves on the $ms$ and $μs$ scales.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Neural Preset for Color Style Transfer
Authors:
Zhanghan Ke,
Yuhao Liu,
Lei Zhu,
Nanxuan Zhao,
Rynson W. H. Lau
Abstract:
In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed. Our method is based on two core designs. First, we propose Deterministic Neural Color Map** (DNCM) to consistently operate on each pixel via an image-adaptive color map** matrix, avoiding ar…
▽ More
In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed. Our method is based on two core designs. First, we propose Deterministic Neural Color Map** (DNCM) to consistently operate on each pixel via an image-adaptive color map** matrix, avoiding artifacts and supporting high-resolution inputs with a small memory footprint. Second, we develop a two-stage pipeline by dividing the task into color normalization and stylization, which allows efficient style switching by extracting color styles as presets and reusing them on normalized input images. Due to the unavailability of pairwise datasets, we describe how to train Neural Preset via a self-supervised strategy. Various advantages of Neural Preset over existing methods are demonstrated through comprehensive evaluations. Notably, Neural Preset enables stable 4K color style transfer in real-time without artifacts. Besides, we show that our trained model can naturally support multiple applications without fine-tuning, including low-light image enhancement, underwater image correction, image dehazing, and image harmonization. Project page with demos: https://zhkkke.github.io/NeuralPreset .
△ Less
Submitted 24 March, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Compact and Quiescent Circumgalactic Medium and Ly$α$ Halos around Extremely Red Quasars (ERQs)
Authors:
Jarred Gillette,
Marie Wingyee Lau,
Fred Hamann,
Serena Perrotta,
David S. N. Rupke,
Dominika Wylezalek,
Nadia L. Zakamska,
Andrey Vayner
Abstract:
Red quasars may represent a young stage of galaxy evolution that provide important feedback to their host galaxies. We are studying a population of extremely red quasars (ERQs) with exceptionally fast and powerful outflows, at median redshift $z$ = 2.6. We present Keck/KCWI integral field spectra of 11 ERQs, which have a median color $i-W3$ = 5.9~mag, median…
▽ More
Red quasars may represent a young stage of galaxy evolution that provide important feedback to their host galaxies. We are studying a population of extremely red quasars (ERQs) with exceptionally fast and powerful outflows, at median redshift $z$ = 2.6. We present Keck/KCWI integral field spectra of 11 ERQs, which have a median color $i-W3$ = 5.9~mag, median $\left\langle L_{\text{bol}} \right\rangle$ $\approx$ 5 $\times$ $10^{47}$ erg s$^{-1}$, Ly$α$ halo luminosity $\left\langle L_{\text{halo}} \right\rangle$ $=$ 5 $\times$ $10^{43}$ erg s$^{-1}$, and maximum linear size $>128$ kpc. The ERQ halos are generally similar to those of blue quasars, following known trends with $L_{\text{bol}}$ in halo properties. ERQs have halo symmetries similar to Type-I blue quasars, suggesting Type-I spatial orientations. ERQ $\left\langle L_{\text{halo}} \right\rangle$ is $\sim$2 dex below blue quasars, which is marginal due to scatter, but consistent with obscuration lowering photon escape fractions. ERQ halos tend to have more compact and circularly symmetric inner regions than blue quasars, with median exponential scale lengths of $\sim$9 kpc, compared to $\sim$16 kpc for blue quasars. When we include the central regions not available in blue quasar studies (due to PSF problems), the true median ERQ halo scale length is just $\sim$6 kpc. ERQ halos are also kinematically quiet, with median velocity dispersion 293 km s$^{-1}$, consistent with expected virial speeds. Overall we find no evidence for feedback on circumgalactic scales, and the current episode of quasar activity, perhaps due to long outflow travel times, has not been around long enough to affect the circumgalactic medium. We confirm the narrow Ly$α$ emission spikes found in ERQ aperture spectra are halo features, and are useful for systemic redshifts and measuring outflow speeds in other features.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Structure-Informed Shadow Removal Networks
Authors:
Yuhao Liu,
Qing Guo,
Lan Fu,
Zhanghan Ke,
Ke Xu,
Wei Feng,
Ivor W. Tsang,
Rynson W. H. Lau
Abstract:
Existing deep learning-based shadow removal methods still produce images with shadow remnants. These shadow remnants typically exist in homogeneous regions with low-intensity values, making them untraceable in the existing image-to-image map** paradigm. We observe that shadows mainly degrade images at the image-structure level (in which humans perceive object shapes and continuous colors). Hence…
▽ More
Existing deep learning-based shadow removal methods still produce images with shadow remnants. These shadow remnants typically exist in homogeneous regions with low-intensity values, making them untraceable in the existing image-to-image map** paradigm. We observe that shadows mainly degrade images at the image-structure level (in which humans perceive object shapes and continuous colors). Hence, in this paper, we propose to remove shadows at the image structure level. Based on this idea, we propose a novel structure-informed shadow removal network (StructNet) to leverage the image-structure information to address the shadow remnant problem. Specifically, StructNet first reconstructs the structure information of the input image without shadows and then uses the restored shadow-free structure prior to guiding the image-level shadow removal. StructNet contains two main novel modules: (1) a mask-guided shadow-free extraction (MSFE) module to extract image structural features in a non-shadow-to-shadow directional manner, and (2) a multi-scale feature & residual aggregation (MFRA) module to leverage the shadow-free structure information to regularize feature consistency. In addition, we also propose to extend StructNet to exploit multi-level structure information (MStructNet), to further boost the shadow removal performance with minimum computational overheads. Extensive experiments on three shadow removal benchmarks demonstrate that our method outperforms existing shadow removal methods, and our StructNet can be integrated with existing methods to improve them further.
△ Less
Submitted 1 February, 2024; v1 submitted 9 January, 2023;
originally announced January 2023.
-
Efficient Mirror Detection via Multi-level Heterogeneous Learning
Authors:
Ruozhen He,
Jiaying Lin,
Rynson W. H. Lau
Abstract:
We present HetNet (Multi-level \textbf{Het}erogeneous \textbf{Net}work), a highly efficient mirror detection network. Current mirror detection methods focus more on performance than efficiency, limiting the real-time applications (such as drones). Their lack of efficiency is aroused by the common design of adopting homogeneous modules at different levels, which ignores the difference between diffe…
▽ More
We present HetNet (Multi-level \textbf{Het}erogeneous \textbf{Net}work), a highly efficient mirror detection network. Current mirror detection methods focus more on performance than efficiency, limiting the real-time applications (such as drones). Their lack of efficiency is aroused by the common design of adopting homogeneous modules at different levels, which ignores the difference between different levels of features. In contrast, HetNet detects potential mirror regions initially through low-level understandings (\textit{e.g.}, intensity contrasts) and then combines with high-level understandings (contextual discontinuity for instance) to finalize the predictions. To perform accurate yet efficient mirror detection, HetNet follows an effective architecture that obtains specific information at different stages to detect mirrors. We further propose a multi-orientation intensity-based contrasted module (MIC) and a reflection semantic logical module (RSL), equipped on HetNet, to predict potential mirror regions by low-level understandings and analyze semantic logic in scenarios by high-level understandings, respectively. Compared to the state-of-the-art method, HetNet runs 664$\%$ faster and draws an average performance gain of 8.9$\%$ on MAE, 3.1$\%$ on IoU, and 2.0$\%$ on F-measure on two mirror detection benchmarks.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Inverse clustering of Gibbs Partitions via independent fragmentation and dual dependent coagulation operators
Authors:
Man Wai Ho,
Lancelot F. James,
John W. Lau
Abstract:
Gibbs partitions of the integers generated by stable subordinators of index $α\in(0,1)$ form remarkable classes of random partitions where in principle much is known about their properties, including practically effortless obtainment of otherwise complex asymptotic results potentially relevant to applications in general combinatorial stochastic processes, random tree/graph growth models and Bayesi…
▽ More
Gibbs partitions of the integers generated by stable subordinators of index $α\in(0,1)$ form remarkable classes of random partitions where in principle much is known about their properties, including practically effortless obtainment of otherwise complex asymptotic results potentially relevant to applications in general combinatorial stochastic processes, random tree/graph growth models and Bayesian statistics. This class includes the well-known models based on the two-parameter Poisson-Dirichlet distribution which forms the bulk of explicit applications. This work continues efforts to provide interpretations for a larger classes of Gibbs partitions by embedding important operations within this framework. Here we address the formidable problem of extending the dual, infinite-block, coagulation/fragmentation results of Jim Pitman (1999, Annals of Probability), where in terms of coagulation they are based on independent two-parameter Poisson-Dirichlet distributions, to all such Gibbs (stable Poisson-Kingman) models. Our results create nested families of Gibbs partitions, and corresponding mass partitions, over any $0<β<α<1.$ We primarily focus on the fragmentation operations, which remain independent in this setting, and corresponding remarkable calculations for Gibbs partitions derived from that operation. We also present definitive results for the dual coagulation operations, now based on our construction of dependent processes, and demonstrate its relatively simple application in terms of Mittag-Leffler and generalized gamma models. The latter demonstrates another approach to recover the duality results in Pitman (1999).
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training
Authors:
Tianyu Huang,
Bowen Dong,
Yunhan Yang,
Xiaoshui Huang,
Rynson W. H. Lau,
Wanli Ouyang,
Wangmeng Zuo
Abstract:
Pre-training across 3D vision and language remains under development because of limited training data. Recent works attempt to transfer vision-language pre-training models to 3D vision. PointCLIP converts point cloud data to multi-view depth maps, adopting CLIP for shape classification. However, its performance is restricted by the domain gap between rendered depth maps and images, as well as the…
▽ More
Pre-training across 3D vision and language remains under development because of limited training data. Recent works attempt to transfer vision-language pre-training models to 3D vision. PointCLIP converts point cloud data to multi-view depth maps, adopting CLIP for shape classification. However, its performance is restricted by the domain gap between rendered depth maps and images, as well as the diversity of depth distributions. To address this issue, we propose CLIP2Point, an image-depth pre-training method by contrastive learning to transfer CLIP to the 3D domain, and adapt it to point cloud classification. We introduce a new depth rendering setting that forms a better visual effect, and then render 52,460 pairs of images and depth maps from ShapeNet for pre-training. The pre-training scheme of CLIP2Point combines cross-modality learning to enforce the depth features for capturing expressive visual and textual features and intra-modality learning to enhance the invariance of depth aggregation. Additionally, we propose a novel Dual-Path Adapter (DPA) module, i.e., a dual-path structure with simplified adapters for few-shot learning. The dual-path structure allows the joint use of CLIP and CLIP2Point, and the simplified adapter can well fit few-shot tasks without post-search. Experimental results show that CLIP2Point is effective in transferring CLIP knowledge to 3D vision. Our CLIP2Point outperforms PointCLIP and other self-supervised 3D networks, achieving state-of-the-art results on zero-shot and few-shot classification.
△ Less
Submitted 22 August, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Continual learning benefits from multiple sleep mechanisms: NREM, REM, and Synaptic Downscaling
Authors:
Brian S. Robinson,
Clare W. Lau,
Alexander New,
Shane M. Nichols,
Erik C. Johnson,
Michael Wolmetz,
William G. Coon
Abstract:
Learning new tasks and skills in succession without losing prior learning (i.e., catastrophic forgetting) is a computational challenge for both artificial and biological neural networks, yet artificial systems struggle to achieve parity with their biological analogues. Mammalian brains employ numerous neural operations in support of continual learning during sleep. These are ripe for artificial ad…
▽ More
Learning new tasks and skills in succession without losing prior learning (i.e., catastrophic forgetting) is a computational challenge for both artificial and biological neural networks, yet artificial systems struggle to achieve parity with their biological analogues. Mammalian brains employ numerous neural operations in support of continual learning during sleep. These are ripe for artificial adaptation. Here, we investigate how modeling three distinct components of mammalian sleep together affects continual learning in artificial neural networks: (1) a veridical memory replay process observed during non-rapid eye movement (NREM) sleep; (2) a generative memory replay process linked to REM sleep; and (3) a synaptic downscaling process which has been proposed to tune signal-to-noise ratios and support neural upkeep. We find benefits from the inclusion of all three sleep components when evaluating performance on a continual learning CIFAR-100 image classification benchmark. Maximum accuracy improved during training and catastrophic forgetting was reduced during later tasks. While some catastrophic forgetting persisted over the course of network training, higher levels of synaptic downscaling lead to better retention of early tasks and further facilitated the recovery of early task accuracy during subsequent training. One key takeaway is that there is a trade-off at hand when considering the level of synaptic downscaling to use - more aggressive downscaling better protects early tasks, but less downscaling enhances the ability to learn new tasks. Intermediate levels can strike a balance with the highest overall accuracies during training. Overall, our results both provide insight into how to adapt sleep components to enhance artificial continual learning systems and highlight areas for future neuroscientific sleep research to further such systems.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
Large-Field Contextual Feature Learning for Glass Detection
Authors:
Haiyang Mei,
Xin Yang,
Letian Yu,
Qiang Zhang,
Xiaopeng Wei,
Rynson W. H. Lau
Abstract:
Glass is very common in our daily life. Existing computer vision systems neglect it and thus may have severe consequences, e.g., a robot may crash into a glass wall. However, sensing the presence of glass is not straightforward. The key challenge is that arbitrary objects/scenes can appear behind the glass. In this paper, we propose an important problem of detecting glass surfaces from a single RG…
▽ More
Glass is very common in our daily life. Existing computer vision systems neglect it and thus may have severe consequences, e.g., a robot may crash into a glass wall. However, sensing the presence of glass is not straightforward. The key challenge is that arbitrary objects/scenes can appear behind the glass. In this paper, we propose an important problem of detecting glass surfaces from a single RGB image. To address this problem, we construct the first large-scale glass detection dataset (GDD) and propose a novel glass detection network, called GDNet-B, which explores abundant contextual cues in a large field-of-view via a novel large-field contextual feature integration (LCFI) module and integrates both high-level and low-level boundary features with a boundary feature enhancement (BFE) module. Extensive experiments demonstrate that our GDNet-B achieves satisfying glass detection results on the images within and beyond the GDD testing set. We further validate the effectiveness and generalization capability of our proposed GDNet-B by applying it to other vision tasks, including mirror segmentation and salient object detection. Finally, we show the potential applications of glass detection and discuss possible future research directions.
△ Less
Submitted 10 September, 2022;
originally announced September 2022.
-
Rain Removal from Light Field Images with 4D Convolution and Multi-scale Gaussian Process
Authors:
Tao Yan,
Mingyue Li,
Bin Li,
Yang Yang,
Rynson W. H. Lau
Abstract:
Existing deraining methods focus mainly on a single input image. However, with just a single input image, it is extremely difficult to accurately detect and remove rain streaks, in order to restore a rain-free image. In contrast, a light field image (LFI) embeds abundant 3D structure and texture information of the target scene by recording the direction and position of each incident ray via a plen…
▽ More
Existing deraining methods focus mainly on a single input image. However, with just a single input image, it is extremely difficult to accurately detect and remove rain streaks, in order to restore a rain-free image. In contrast, a light field image (LFI) embeds abundant 3D structure and texture information of the target scene by recording the direction and position of each incident ray via a plenoptic camera. LFIs are becoming popular in the computer vision and graphics communities. However, making full use of the abundant information available from LFIs, such as 2D array of sub-views and the disparity map of each sub-view, for effective rain removal is still a challenging problem. In this paper, we propose a novel method, 4D-MGP-SRRNet, for rain streak removal from LFIs. Our method takes as input all sub-views of a rainy LFI. To make full use of the LFI, it adopts 4D convolutional layers to simultaneously process all sub-views of the LFI. In the pipeline, the rain detection network, MGPDNet, with a novel Multi-scale Self-guided Gaussian Process (MSGP) module is proposed to detect high-resolution rain streaks from all sub-views of the input LFI at multi-scales. Semi-supervised learning is introduced for MSGP to accurately detect rain streaks by training on both virtual-world rainy LFIs and real-world rainy LFIs at multi-scales via computing pseudo ground truths for real-world rain streaks. We then feed all sub-views subtracting the predicted rain streaks into a 4D convolution-based Depth Estimation Residual Network (DERNet) to estimate the depth maps, which are later converted into fog maps. Finally, all sub-views concatenated with the corresponding rain streaks and fog maps are fed into a powerful rainy LFI restoring model based on the adversarial recurrent neural network to progressively eliminate rain streaks and recover the rain-free LFI.
△ Less
Submitted 27 January, 2023; v1 submitted 16 August, 2022;
originally announced August 2022.
-
Theory of the force of Friction Acting on Water Chains Flowing through Carbon Nanotubes
Authors:
J. B. Sokoloff,
A. W. C. Lau
Abstract:
A simple model for the friction experienced by the one dimensional water chains that flow through subnanometer diameter carbon nanotubes is studied. The model is based on a lowest order perturbation theory treatment of the friction experienced by the water chains due to the excitation of phonon and electron excitations in both the nanotube and the water chain, as a result of the motion of the chai…
▽ More
A simple model for the friction experienced by the one dimensional water chains that flow through subnanometer diameter carbon nanotubes is studied. The model is based on a lowest order perturbation theory treatment of the friction experienced by the water chains due to the excitation of phonon and electron excitations in both the nanotube and the water chain, as a result of the motion of the chain. On the basis of this model, we are able to demonstrate that the observed flow velocities of water chains through carbon nanotubes of the order of cm/s can be accounted for, if the nanotube is metallic. If it is insulating, however, our calculations imply that the flow velocity of the water could be much larger for the pressure gradient in experimental studies of water flow through subnanometer diameter nanotubes.
△ Less
Submitted 24 January, 2023; v1 submitted 1 August, 2022;
originally announced August 2022.
-
Weakly-Supervised Camouflaged Object Detection with Scribble Annotations
Authors:
Ruozhen He,
Qihua Dong,
Jiaying Lin,
Rynson W. H. Lau
Abstract:
Existing camouflaged object detection (COD) methods rely heavily on large-scale datasets with pixel-wise annotations. However, due to the ambiguous boundary, annotating camouflage objects pixel-wisely is very time-consuming and labor-intensive, taking ~60mins to label one image. In this paper, we propose the first weakly-supervised COD method, using scribble annotations as supervision. To achieve…
▽ More
Existing camouflaged object detection (COD) methods rely heavily on large-scale datasets with pixel-wise annotations. However, due to the ambiguous boundary, annotating camouflage objects pixel-wisely is very time-consuming and labor-intensive, taking ~60mins to label one image. In this paper, we propose the first weakly-supervised COD method, using scribble annotations as supervision. To achieve this, we first relabel 4,040 images in existing camouflaged object datasets with scribbles, which takes ~10s to label one image. As scribble annotations only describe the primary structure of objects without details, for the network to learn to localize the boundaries of camouflaged objects, we propose a novel consistency loss composed of two parts: a cross-view loss to attain reliable consistency over different images, and an inside-view loss to maintain consistency inside a single prediction map. Besides, we observe that humans use semantic information to segment regions near the boundaries of camouflaged objects. Hence, we further propose a feature-guided loss, which includes visual features directly extracted from images and semantically significant features captured by the model. Finally, we propose a novel network for COD via scribble learning on structural information and semantic relations. Our network has two novel modules: the local-context contrasted (LCC) module, which mimics visual inhibition to enhance image contrast/sharpness and expand the scribbles into potential camouflaged regions, and the logical semantic relation (LSR) module, which analyzes the semantic relation to determine the regions representing the camouflaged object. Experimental results show that our model outperforms relevant SOTA methods on three COD benchmarks with an average improvement of 11.0% on MAE, 3.2% on S-measure, 2.5% on E-measure, and 4.4% on weighted F-measure.
△ Less
Submitted 28 November, 2022; v1 submitted 28 July, 2022;
originally announced July 2022.
-
Symmetry-Aware Transformer-based Mirror Detection
Authors:
Tianyu Huang,
Bowen Dong,
Jiaying Lin,
Xiaohui Liu,
Rynson W. H. Lau,
Wangmeng Zuo
Abstract:
Mirror detection aims to identify the mirror regions in the given input image. Existing works mainly focus on integrating the semantic features and structural features to mine specific relations between mirror and non-mirror regions, or introducing mirror properties like depth or chirality to help analyze the existence of mirrors. In this work, we observe that a real object typically forms a loose…
▽ More
Mirror detection aims to identify the mirror regions in the given input image. Existing works mainly focus on integrating the semantic features and structural features to mine specific relations between mirror and non-mirror regions, or introducing mirror properties like depth or chirality to help analyze the existence of mirrors. In this work, we observe that a real object typically forms a loose symmetry relationship with its corresponding reflection in the mirror, which is beneficial in distinguishing mirrors from real objects. Based on this observation, we propose a dual-path Symmetry-Aware Transformer-based mirror detection Network (SATNet), which includes two novel modules: Symmetry-Aware Attention Module (SAAM) and Contrast and Fusion Decoder Module (CFDM). Specifically, we first adopt a transformer backbone to model global information aggregation in images, extracting multi-scale features in two paths. We then feed the high-level dual-path features to SAAMs to capture the symmetry relations. Finally, we fuse the dual-path features and refine our prediction maps progressively with CFDMs to obtain the final mirror mask. Experimental results show that SATNet outperforms both RGB and RGB-D mirror detection methods on all available mirror detection datasets. Codes and trained models are available at: https://github.com/tyhuang0428/SATNet.
△ Less
Submitted 4 September, 2022; v1 submitted 13 July, 2022;
originally announced July 2022.
-
Harmonizer: Learning to Perform White-Box Image and Video Harmonization
Authors:
Zhanghan Ke,
Chunyi Sun,
Lei Zhu,
Ke Xu,
Rynson W. H. Lau
Abstract:
Recent works on image harmonization solve the problem as a pixel-wise image translation task via large autoencoders. They have unsatisfactory performances and slow inference speeds when dealing with high-resolution images. In this work, we observe that adjusting the input arguments of basic image filters, e.g., brightness and contrast, is sufficient for humans to produce realistic images from the…
▽ More
Recent works on image harmonization solve the problem as a pixel-wise image translation task via large autoencoders. They have unsatisfactory performances and slow inference speeds when dealing with high-resolution images. In this work, we observe that adjusting the input arguments of basic image filters, e.g., brightness and contrast, is sufficient for humans to produce realistic images from the composite ones. Hence, we frame image harmonization as an image-level regression problem to learn the arguments of the filters that humans use for the task. We present a Harmonizer framework for image harmonization. Unlike prior methods that are based on black-box autoencoders, Harmonizer contains a neural network for filter argument prediction and several white-box filters (based on the predicted arguments) for image harmonization. We also introduce a cascade regressor and a dynamic loss strategy for Harmonizer to learn filter arguments more stably and precisely. Since our network only outputs image-level arguments and the filters we used are efficient, Harmonizer is much lighter and faster than existing methods. Comprehensive experiments demonstrate that Harmonizer surpasses existing methods notably, especially with high-resolution inputs. Finally, we apply Harmonizer to video harmonization, which achieves consistent results across frames and 56 fps at 1080P resolution. Code and models are available at: https://github.com/ZHKKKe/Harmonizer.
△ Less
Submitted 20 July, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
Depth-aware Glass Surface Detection with Cross-modal Context Mining
Authors:
Jiaying Lin,
Yuen Hei Yeung,
Rynson W. H. Lau
Abstract:
Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels. This however poses substantial challenges on the operations of autonomous systems such as robots, self-driving cars and drones, as the glass panels can become transparent obstacles to the navigation.Existing works attempt to exploit various cues, including glass boundary context or reflection…
▽ More
Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels. This however poses substantial challenges on the operations of autonomous systems such as robots, self-driving cars and drones, as the glass panels can become transparent obstacles to the navigation.Existing works attempt to exploit various cues, including glass boundary context or reflections, as a prior. However, they are all based on input RGB images.We observe that the transmission of 3D depth sensor light through glass surfaces often produces blank regions in the depth maps, which can offer additional insights to complement the RGB image features for glass surface detection. In this paper, we propose a novel framework for glass surface detection by incorporating RGB-D information, with two novel modules: (1) a cross-modal context mining (CCM) module to adaptively learn individual and mutual context features from RGB and depth information, and (2) a depth-missing aware attention (DAA) module to explicitly exploit spatial locations where missing depths occur to help detect the presence of glass surfaces. In addition, we propose a large-scale RGB-D glass surface detection dataset, called \textit{RGB-D GSD}, for RGB-D glass surface detection. Our dataset comprises 3,009 real-world RGB-D glass surface images with precise annotations. Extensive experimental results show that our proposed model outperforms state-of-the-art methods.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
Atomtronic multi-terminal Aharonov-Bohm interferometer
Authors:
Jonathan Wei Zhong Lau,
Koon Siang Gan,
Rainer Dumke,
Luigi Amico,
Leong-Chuan Kwek,
Tobias Haug
Abstract:
We study a multi-functional device for cold atoms consisting of a three-terminal ring circuit pierced by a synthetic magnetic flux, where the ring can be continuous or discretized. The flux controls the atomic current through the ring via the Aharonov-Bohm effect. Our device shows a flux-induced transition of reflections from an Andreev-like negative density to positive density. Further, the flux…
▽ More
We study a multi-functional device for cold atoms consisting of a three-terminal ring circuit pierced by a synthetic magnetic flux, where the ring can be continuous or discretized. The flux controls the atomic current through the ring via the Aharonov-Bohm effect. Our device shows a flux-induced transition of reflections from an Andreev-like negative density to positive density. Further, the flux can direct the atomic current into specific output ports, realizing a flexible non-reciprocal switch to connect multiple atomic systems or sense rotations. By changing the flux linearly in time, we convert constant matter wave currents into an AC modulated current. This effect can be used to realize an atomic frequency generator and study fundamental problems related to the Aharonov-Bohm effect. We experimentally demonstrate Bose-Einstein condensation into the light-shaped optical potential of the three-terminal ring. Our work opens up the possibility of novel atomtronic devices for practical applications in quantum technologies.
△ Less
Submitted 15 May, 2023; v1 submitted 3 May, 2022;
originally announced May 2022.
-
Convex Optimization for Nonequilibrium Steady States on a Hybrid Quantum Processor
Authors:
Jonathan Wei Zhong Lau,
Kian Hwee Lim,
Kishor Bharti,
Leong-Chuan Kwek,
Sai Vinjanampathy
Abstract:
Finding the transient and steady state properties of open quantum systems is a central problem in various fields of quantum technologies. Here, we present a quantum-assisted algorithm to determine the steady states of open system dynamics. By reformulating the problem of finding the fixed point of Lindblad dynamics as a feasibility semidefinite program, we bypass several well-known issues with var…
▽ More
Finding the transient and steady state properties of open quantum systems is a central problem in various fields of quantum technologies. Here, we present a quantum-assisted algorithm to determine the steady states of open system dynamics. By reformulating the problem of finding the fixed point of Lindblad dynamics as a feasibility semidefinite program, we bypass several well-known issues with variational quantum approaches to solving for steady states. We demonstrate that our hybrid approach allows us to estimate the steady states of higher dimensional open quantum systems and discuss how our method can find multiple steady states for systems with symmetries.
△ Less
Submitted 7 July, 2023; v1 submitted 7 April, 2022;
originally announced April 2022.
-
Rethinking Video Salient Object Ranking
Authors:
Jiaying Lin,
Huankang Guan,
Rynson W. H. Lau
Abstract:
Salient Object Ranking (SOR) involves ranking the degree of saliency of multiple salient objects in an input image. Most recently, a method is proposed for ranking salient objects in an input video based on a predicted fixation map. It relies solely on the density of the fixations within the salient objects to infer their saliency ranks, which is incompatible with human perception of saliency rank…
▽ More
Salient Object Ranking (SOR) involves ranking the degree of saliency of multiple salient objects in an input image. Most recently, a method is proposed for ranking salient objects in an input video based on a predicted fixation map. It relies solely on the density of the fixations within the salient objects to infer their saliency ranks, which is incompatible with human perception of saliency ranking. In this work, we propose to explicitly learn the spatial and temporal relations between different salient objects to produce the saliency ranks. To this end, we propose an end-to-end method for video salient object ranking (VSOR), with two novel modules: an intra-frame adaptive relation (IAR) module to learn the spatial relation among the salient objects in the same frame locally and globally, and an inter-frame dynamic relation (IDR) module to model the temporal relation of saliency across different frames. In addition, to address the limited video types (just sports and movies) and scene diversity in the existing VSOR dataset, we propose a new dataset that covers different video types and diverse scenes on a large scale. Experimental results demonstrate that our method outperforms state-of-the-art methods in relevant fields. We will make the source code and our proposed dataset available.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Efficient, ever-ready quantum memory at room temperature for single photons
Authors:
Anthony C. Leung,
W. Y. Sarah Lau,
Aaron D. Tranter,
Karun V. Paul,
Markus Rambach,
Ben C. Buchler,
** Koy Lam,
Andrew G. White,
Till J. Weinhold
Abstract:
Efficient quantum memories will be an essential building block of large scale networked quantum systems and provide a link between flying photonic qubits and atomic or quasi-atomic local quantum processors. To provide a path to scalability avoidance of bulky, difficult to maintain systems such as high vacuum and low temperature cryogenics is imperative. Memory efficiencies above 50% are required t…
▽ More
Efficient quantum memories will be an essential building block of large scale networked quantum systems and provide a link between flying photonic qubits and atomic or quasi-atomic local quantum processors. To provide a path to scalability avoidance of bulky, difficult to maintain systems such as high vacuum and low temperature cryogenics is imperative. Memory efficiencies above 50% are required to be operating above the quantum no-cloning limit. Such high efficiencies have only been achieved in systems with photon sources tailored to the memory bandwidth. In this paper we explore the combination of an ultralow spectral bandwidth source of single photons from cavity-enhanced spontaneous parametric down-conversion with a gas-ensemble atomic memory. Our rubidium vapour gradient echo memory achieves 84$\pm$3% recall efficiency of single photons: a record for an always-ready, hot, and vacuum system free optical memory.
△ Less
Submitted 29 March, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Bi-directional Object-context Prioritization Learning for Saliency Ranking
Authors:
Xin Tian,
Ke Xu,
Xin Yang,
Lin Du,
Baocai Yin,
Rynson W. H. Lau
Abstract:
The saliency ranking task is recently proposed to study the visual behavior that humans would typically shift their attention over different objects of a scene based on their degrees of saliency. Existing approaches focus on learning either object-object or object-scene relations. Such a strategy follows the idea of object-based attention in Psychology, but it tends to favor those objects with str…
▽ More
The saliency ranking task is recently proposed to study the visual behavior that humans would typically shift their attention over different objects of a scene based on their degrees of saliency. Existing approaches focus on learning either object-object or object-scene relations. Such a strategy follows the idea of object-based attention in Psychology, but it tends to favor those objects with strong semantics (e.g., humans), resulting in unrealistic saliency ranking. We observe that spatial attention works concurrently with object-based attention in the human visual recognition system. During the recognition process, the human spatial attention mechanism would move, engage, and disengage from region to region (i.e., context to context). This inspires us to model the region-level interactions, in addition to the object-level reasoning, for saliency ranking. To this end, we propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking. Our model includes two novel modules: (1) a selective object saliency (SOS) module that models objectbased attention via inferring the semantic representation of the salient object, and (2) an object-context-object relation (OCOR) module that allocates saliency ranks to objects by jointly modeling the object-context and context-object interactions of the salient objects. Extensive experiments show that our approach outperforms existing state-of-theart methods. Our code and pretrained model are available at https://github.com/GrassBro/OCOR.
△ Less
Submitted 22 March, 2022; v1 submitted 17 March, 2022;
originally announced March 2022.
-
Probing the Inner Circumgalactic Medium and Quasar Illumination around the Reddest `Extremely Red Quasar' (ERQ)
Authors:
Marie Wingyee Lau,
Fred Hamann,
Jarred Gillette,
Serena Perrotta,
David S. N. Rupke,
Dominika Wylezalek,
Nadia L. Zakamska
Abstract:
Dusty quasars might be in a young stage of galaxy evolution with prominent quasar feedback. A recently discovered population of luminous, extremely red quasars at $z\sim$~2--4 has extreme spectral properties related to exceptionally powerful quasar-driven outflows. We present Keck/KCWI observations of the reddest known ERQ, at $z=$\,2.3184, with extremely fast [\ion{O}{III}]~$λ$5007 outflow at…
▽ More
Dusty quasars might be in a young stage of galaxy evolution with prominent quasar feedback. A recently discovered population of luminous, extremely red quasars at $z\sim$~2--4 has extreme spectral properties related to exceptionally powerful quasar-driven outflows. We present Keck/KCWI observations of the reddest known ERQ, at $z=$\,2.3184, with extremely fast [\ion{O}{III}]~$λ$5007 outflow at $\sim$6000~km~s$^{-1}$. The Ly$α$ halo spans $\sim$100~kpc. The halo is kinematically quiet, with velocity dispersion $\sim$300~km~s$^{-1}$ and no broadening above the dark matter circular velocity down to the spatial resolution $\sim$6~kpc from the quasar. We detect spatially-resolved \ion{He}{II}~$λ$1640 and \ion{C}{IV}~$λ$1549 emissions with kinematics similar to the Ly$α$ halo and a narrow component in the [\ion{O}{III}]~$λ$5007. Quasar reddening acts as a coronagraph allowing views of the innermost halo. A narrow Ly$α$ spike in the quasar spectrum is inner halo emission, confirming the broad \ion{C}{IV}~$λ$1549 in the unresolved quasar is blueshifted by $2240$~km~s$^{-1}$ relative to the halo frame. We propose the inner halo is dominated by moderate-speed outflow driven in the past and the outer halo dominated by inflow. The high central concentration of the halo and the symmetric morphology of the inner region are consistent with the ERQ being in earlier evolutionary stage than blue quasars. The \ion{He}{II}~$λ$1640/Ly$α$ ratio of the inner halo and the asymmetry level of the overall halo are dissimilar to Type~II quasars, suggesting unique physical conditions for this ERQ that are beyond orientation differences from other quasar populations. We find no evidence of mechanical quasar feedback in the Ly$α$-emitting halo.
△ Less
Submitted 2 August, 2022; v1 submitted 11 March, 2022;
originally announced March 2022.
-
A photonic chip-based machine learning approach for the prediction of molecular properties
Authors:
Hui Zhang,
Jonathan Wei Zhong Lau,
Lingxiao Wan,
Liang Shi,
Hong Cai,
Xianshu Luo,
Patrick Lo,
Chee-Kong Lee,
Leong-Chuan Kwek,
Ai Qun Liu
Abstract:
Machine learning methods have revolutionized the discovery process of new molecules and materials. However, the intensive training process of neural networks for molecules with ever-increasing complexity has resulted in exponential growth in computation cost, leading to long simulation time and high energy consumption. Photonic chip technology offers an alternative platform for implementing neural…
▽ More
Machine learning methods have revolutionized the discovery process of new molecules and materials. However, the intensive training process of neural networks for molecules with ever-increasing complexity has resulted in exponential growth in computation cost, leading to long simulation time and high energy consumption. Photonic chip technology offers an alternative platform for implementing neural networks with faster data processing and lower energy usage compared to digital computers. Photonics technology is naturally capable of implementing complex-valued neural networks at no additional hardware cost. Here, we demonstrate the capability of photonic neural networks for predicting the quantum mechanical properties of molecules. To the best of our knowledge, this work is the first to harness photonic technology for machine learning applications in computational chemistry and molecular sciences, such as drug discovery and materials design. We further show that multiple properties can be learned simultaneously in a photonic chip via a multi-task regression learning algorithm, which is also the first of its kind as well, as most previous works focus on implementing a network in the classification task.
△ Less
Submitted 25 December, 2022; v1 submitted 2 March, 2022;
originally announced March 2022.
-
Event-based clinical findings extraction from radiology reports with pre-trained language model
Authors:
Wilson Lau,
Kevin Lybarger,
Martin L. Gunn,
Meliha Yetisgen
Abstract:
Radiology reports contain a diverse and rich set of clinical abnormalities documented by radiologists during their interpretation of the images. Comprehensive semantic representations of radiological findings would enable a wide range of secondary use applications to support diagnosis, triage, outcomes prediction, and clinical research. In this paper, we present a new corpus of radiology reports a…
▽ More
Radiology reports contain a diverse and rich set of clinical abnormalities documented by radiologists during their interpretation of the images. Comprehensive semantic representations of radiological findings would enable a wide range of secondary use applications to support diagnosis, triage, outcomes prediction, and clinical research. In this paper, we present a new corpus of radiology reports annotated with clinical findings. Our annotation schema captures detailed representations of pathologic findings that are observable on imaging ("lesions") and other types of clinical problems ("medical problems"). The schema used an event-based representation to capture fine-grained details, including assertion, anatomy, characteristics, size, count, etc. Our gold standard corpus contained a total of 500 annotated computed tomography (CT) reports. We extracted triggers and argument entities using two state-of-the-art deep learning architectures, including BERT. We then predicted the linkages between trigger and argument entities (referred to as argument roles) using a BERT-based relation extraction model. We achieved the best extraction performance using a BERT model pre-trained on 3 million radiology reports from our institution: 90.9%-93.4% F1 for finding triggers 72.0%-85.6% F1 for arguments roles. To assess model generalizability, we used an external validation set randomly sampled from the MIMIC Chest X-ray (MIMIC-CXR) database. The extraction performance on this validation set was 95.6% for finding triggers and 79.1%-89.7% for argument roles, demonstrating that the model generalized well to the cross-institutional data with a different imaging modality. We extracted the finding events from all the radiology reports in the MIMIC-CXR database and provided the extractions to the research community.
△ Less
Submitted 27 December, 2021;
originally announced December 2021.
-
Highly active hydrogen evolution facilitated by topological surface states on a Pd/SnTe metal/topological crystalline insulator heterostructure
Authors:
Qing Qu,
Bin Liu,
Wing Sum Lau,
Ding Pan,
Iam Keong Sou
Abstract:
Recently, topological quantum materials have emerged as a promising electrocatalyst for hydrogen evolution reaction (HER). However, most of their performance largely lags behind noble metals such as benchmark platinum (Pt). In this work, a Pd(20nm)/SnTe(70nm) heterostructure, fabricated by molecular beam epitaxy and electron beam evaporation, is found to display much higher electrocatalytic activi…
▽ More
Recently, topological quantum materials have emerged as a promising electrocatalyst for hydrogen evolution reaction (HER). However, most of their performance largely lags behind noble metals such as benchmark platinum (Pt). In this work, a Pd(20nm)/SnTe(70nm) heterostructure, fabricated by molecular beam epitaxy and electron beam evaporation, is found to display much higher electrocatalytic activity than that of a pure Pd(20nm) thin film and even higher than that of a commercial Pt foil. This heterostructure adopts an extracted turnover frequency value more than two times higher than that of the Pd(20nm) thin film at a potential of 0.2 V, indicating a much higher intrinsic activity per Pd site. Density functional theory calculations show that the conventional d-band theory, which works well for many transition metal heterostructures, cannot explain the enhancement of electrocatalytic performance. Instead, we found that the topological surface states (TSSs) of the SnTe (001) underlayer play a key role; electrons transfer from both the Pd surface and the adsorbed H atoms to the TSSs of SnTe (001), resulting in weaker Pd-H binding strength and more favorable hydrogen adsorption free energies. Our work demonstrates for the first time that a metal/topological quantum material heterostructure could be a prominent catalyst to enjoy HER activity outperforming that of a commercial Pt foil and offers a promising direction to optimize the performance of electrocatalysts based on topological quantum materials.
△ Less
Submitted 1 September, 2022; v1 submitted 9 December, 2021;
originally announced December 2021.
-
Geometry-aware Two-scale PIFu Representation for Human Reconstruction
Authors:
Zheng Dong,
Ke Xu,
Ziheng Duan,
Hujun Bao,
Weiwei Xu,
Rynson W. H. Lau
Abstract:
Although PIFu-based 3D human reconstruction methods are popular, the quality of recovered details is still unsatisfactory. In a sparse (e.g., 3 RGBD sensors) capture setting, the depth noise is typically amplified in the PIFu representation, resulting in flat facial surfaces and geometry-fallible bodies. In this paper, we propose a novel geometry-aware two-scale PIFu for 3D human reconstruction fr…
▽ More
Although PIFu-based 3D human reconstruction methods are popular, the quality of recovered details is still unsatisfactory. In a sparse (e.g., 3 RGBD sensors) capture setting, the depth noise is typically amplified in the PIFu representation, resulting in flat facial surfaces and geometry-fallible bodies. In this paper, we propose a novel geometry-aware two-scale PIFu for 3D human reconstruction from sparse, noisy inputs. Our key idea is to exploit the complementary properties of depth denoising and 3D reconstruction, for learning a two-scale PIFu representation to reconstruct high-frequency facial details and consistent bodies separately. To this end, we first formulate depth denoising and 3D reconstruction as a multi-task learning problem. The depth denoising process enriches the local geometry information of the reconstruction features, while the reconstruction process enhances depth denoising with global topology information. We then propose to learn the two-scale PIFu representation using two MLPs based on the denoised depth and geometry-aware features. Extensive experiments demonstrate the effectiveness of our approach in reconstructing facial details and bodies of different poses and its superiority over state-of-the-art methods.
△ Less
Submitted 27 September, 2022; v1 submitted 3 December, 2021;
originally announced December 2021.